> I thought someone out there might be interested in a poster session I just
> did at the Innovative Users Group Conference 2009. I undertook the project
> because i was personally interested in the outcome, and because I look
> forward to the day when these data will be available - from Google, from the
> Internet Archive, from Hathi trust, from ????.
> It's fraught with problems and both recall and precision errors, but I call
> it an "approximation" of citation searching for the books in the Colgate
> collection, then ranking them by the number of hits.
> I took about 688,000 monographic records that had both an author and a
> title from the Colgate library catalog, and constructed a search in
> GoogleBookSearch. Since I wanted to find citations - or other books that
> mentioned the book in question, I didn't restrict by field.
> Title phrase from 245 subfields a & b, up to 10 words long.
> first two words in the author (if a personal author)
> author phrase (if a conference author)
> first 6 words in author (if a corporate author)
> Searched these over the course of 3/1/2009 - 4/27/2009 at less than 380
> searches an hour (took 3 machines to get the job done in 6 weeks).
> Screen-scraped Google's reported "1 to 8 of <#hits> records".
> The results rank these by the # of "citations".
> My results omit GovDocs for the time being, since I forgot to download the
> 086s into the records - I could add that later. Those corporate bodies are
> problems in my search strategy, anyway. I did include them in the search
> portion of the project.
> I don't know how many users this MySql site will support - it's entirely
> un-stress-tested, but i trust you won't all go searching it at once.
> Cindy Harper, Systems Librarian
> Colgate University Libraries
> [log in to unmask]