LISTSERV 16.5 - CODE4LIB Archives

That example "gil scott heron circle of stone" returns a JHU record as
the first hit, but the item is his thesis and the url is found in the
Wikipedia article on Gil Scott-Heron, so it has a lot of pagerank
independently of the catalogue. I'm just curious: can you control for
that kind of externally-prominent link in your 59% number - say by
taking the top x items that people hit from Google and searching their
urls in Google to see if anyone else is linking to them? It would be
interesting to know how much of that 59% is due to the richness and
well-linkedness of JHU's special collections rather than to the
prominence Google is giving to the catalogue.

Peter




Peter Binkley
Digital Initiatives Technology Librarian
Information Technology Services
[log in to unmask]

4-30 Cameron Library
University of Alberta
Edmonton, Alberta
Canada T6G 2J8

phone 780-492-3743
fax 780-492-9243



-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
Sean Hannan
Sent: Thursday, February 23, 2012 11:37 AM
To: [log in to unmask]
Subject: Re: [CODE4LIB] Local catalog records and Google, Bing, Yahoo!

Our Blacklight-powered catalog (https://catalyst.library.jhu.edu/) comes
up
a lot in google search results (try gil scott heron circle of stone).

Some numbers:

59% of our total catalog traffic comes from google searches
0.04% of our total catalog traffic comes from yahoo searches
0.03% of our total catalog traffic comes from bing searches

For context, 32.96% of our total catalog traffic is direct traffic and
referrals from all of the library websites combined.

Anecdotally, it would appear that bing (and bing-using yahoo) seem to
drastically play down catalog records in their results. We're not doing
anything to favor a particular search engine; we have a completely open
robots.txt file.

Google regularly indexes our catalog. Every couple days or so. I haven't
checked in awhile.

We're not doing any fancy SEO here (though, I'd like to implement some
of
the microdata stuff).  It's just a function of how the site works. We
link a
lot of our catalog results to further searches (clicking on an author
name
takes you to an author search with that name, etc).  Google *loves* that
type of intertextual website linking (see also: Wikipedia). We also have
stable URLs. Search URLs will always return searches with those
parameters,
item URLs are based on an ID that does not change.

All of that good stuff doesn't help us with bing, though. ...But I'm not
really concerned with remedying that, right this moment.

-Sean

On 2/23/12 12:37 PM, "[log in to unmask]"
<[log in to unmask]>
wrote:

> First of all, I'm going to say I know little in this area. I've done
some
> preliminary research about search indexing (Google's) and investigated
a
> few OPAC robot.txt files. Now to my questions:
> 
>    - Can someone explain to me or point me to research as to why local
>    library catalog records do not show up in Google, Bing, or Yahoo!
search
>    results?
>    - Is there a general prohibition by libraries for search engines to
>    crawl their public records?
>    - Do the search engines not index these records actively?
>    - Is it a matter of SEO/promoted results?
>    - Is it because some systems don't mint URLs for each record?
> 
> I haven't seen a lot of discussion about this recently and I know
Jason
> Ranallo has done a lot of work in this area and gave a great talk at
> code4lib Seattle on microdata/Schema.org, so I figured this could be
part
> of that continuing conversation.
> 
> I look forward to being educated by you all,
> 
> Tod