LISTSERV 16.5 - CODE4LIB Archives

On Wed, May 6, 2015 at 8:15 AM, Ethan Gruber <[log in to unmask]> wrote:

> +1 on the RDFa and schema.org. For those that don't know the library URL
> off-hand, it is much easier to find a library website by Googling than it
> is to go through the central university portal, and the hours will show up
> at the top of the page after having been harvested by search engines.


Hi, so this is an area that I've done, and am doing, a fair bit of work.
See http://stuff.coffeecode.net/2015/ola_white_hat_seo/#/1/10 for some fun
slides from a presentation I gave in January at the Ontario Library
Association SuperConference that show some ways data gets into
Google/Yahoo/Bing and concludes that the OCLC Registry "manually maintain
yet another copy of your data elsewhere" approach isn't working. (Hit "s"
to get speaker notes).

The rest of the presentation goes into depth on how to use RDFa to mark up
a real library web page with location, contact info, opening hours, and
event info. And I've posited that crawling library sites to pull
single-sourced data (e.g. you update your website to provide updated hours
to humans, and the machines automatically benefit) would be a much more
effective, accurate, and usable approach than maintaining copies of the
data in Google+, OCLC Registry, etc. We could produce results like
http://cwrc.ca/rsc-src/ that stay accurate, rather than being one-off
efforts that decay over time. (It would be great if the OCLC Registry had a
"crawl this URL" option so that it could keep all of its data up-to-date
and incentive libraries to publish the data in a machine-readable format
such as RDFa + schema.org.)

On the "but that's technically challenging" front, I tried pursuing some
grant funding to produce templates for publishing that structured info in
Drupal, Joomla, and other commonly used CMSs. Sadly, my application was
recently denied, but that will only slow me down; I'm not going to give up
on the goal. I have a paper in the works that will expand on the content of
the presentation for those sites that have the ability (technical and
administrative) to modify their own web pages.

Sites running the Evergreen library system already generate a page for each
of their libraries that contains this structured data (e.g.
https://laurentian.concat.ca/eg/opac/library/OSUL), which is single sourced
from the data that has to be maintained in the library system anyway.

I'll happily acknowledge that getting search engines to harvest the right
data is not easy, though: right now, for example, if you search for "J.N.
Desmarais Library" it currently shows that the library is open 24 hours a
day, which is completely false--probably maliciously
submitted--information. *sigh* I've edited that info in the Google+ page at
https://plus.google.com/+JNDesmaraisLibraryGreaterSudbury but even though
it is a verified place and I am a manager of the G+ page, the edits still
go through approval by Googlers. There appears to be no good way to tell
Google "Hey, *this* is the URL you are looking for!". Somewhat amusingly,
the entire reason I started working with schema.org dates back to an
presentation I attended about Google Places years ago, where I whined about
having to maintain yet another copy of data in yet another place, and the
response inferred that schema.org might be the solution to that problem.

Also, due to the structure of university web property ownership, we
currently don't have the ability to modify our actual library home page to
include any RDFa, which is a *wee* bit frustrating given my work in the
field. Heh.

Dan Scott
Laurentian University