LISTSERV 16.5 - CODE4LIB Archives

I tend to agree with Jonathan Rochkind that having every library's bib
record turn up as a Google snippet would be unwelcome. Better to
mediate the access to local library copies with something more
generic.

OCLC's WorldCat.org does get crawled and indexed in Google, though
WorldCat.org hits don't always make the first result screen. One
simple solution for libraries whose holdings are reflected in WorldCat
to get more visibility through Google would be to simplify the
(already fairly simple) task of specifying worldcat.org as the domain
for a search. WorldCat in turn is able to rank its display of holdings
by proximity to the searcher, so locally, I can see which of the many
regional libraries around me in the Twin Cities have copies of a title
of interest. And since I have borrowing rights for most of the public
libraries, that's great.

But there's a catch--when WorldCat redirects a search to the selected
local library catalog, it targets the OCLC record number. If the
holding library has included the OCLC record number in its indexed
data, the user goes right to the desired record. If not, the user is
left wondering why the title of interest turned into some mysterious
number and the search failed.

Stephen

On Thu, Feb 23, 2012 at 4:11 PM, David Friggens <[log in to unmask]> wrote:
>>>> why local library catalog records do not show up in search results?
>
> Basically, most OPACs are crap. :-) There are still some that that
> don't provide persistent links to record pages, and most are designed
> so that the user has a "session" and gets kicked out after 10 minutes
> or so.
>
> These issues were part of Tim Spalding's message that as well as
> joining web 2.0, libraries also need to join web 1.0.
> http://vimeo.com/user2734401
>
>>> We don't allow crawlers because it has caused serious performance issues in the past.
>
> Specifically (in our case at least), each request creates a new
> session on the server which doesn't time out for about 10 minutes,
> thus a crawler would fill up the system's RAM pretty quickly.
>
>> You can use Crawl-delay:
>> http://en.wikipedia.org/wiki/Robots_exclusion_standard#Crawl-delay_directive
>>
>> You can set Google's crawl rate in Webmaster Tools as well.
>
> I've had this suggested before and thought about it, but never had it
> high up enough in my list to test it out. Has anyone actually used the
> above to get a similar OPAC crawled successfully and not brought down
> on its knees?
>
> David



-- 
Stephen Hearn, Metadata Strategist
Technical Services, University Libraries
University of Minnesota
160 Wilson Library
309 19th Avenue South
Minneapolis, MN 55455
Ph: 612-625-2328
Fx: 612-625-3428