Print

Print


I've been looking at the logs for our OAI server and I'd like to appeal to
those harvesting over OAI to put URLs into the user agent string. Putting
the name of your project into the user agent string seems like a great way
to build profile. It also avoids the situation where the easiest way to
contact you is via the contacts associated with your DNS block.

For reference, these are some of the user agent strings I'm seeing
(standard browser strings removed):

"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
"Net::OAI::Harvester"
"Googlebot/2.1 (+http://www.google.com/bot.html)"
"OAIHarvester/2.0"
"Jakarta Commons-HttpClient/3.1"
"Mozilla/5.0 (compatible; Baiduspider/2.0; +
http://www.baidu.com/search/spider.html)"
"Celestial/3.02"
"WorldCat Digital Collection Gateway from OCLC.org"
"Apache-HttpClient/4.0.1 (java 1.5)"
"lwp-trivial/1.41"
"OAIGet-1.12"
"DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +
http://www.google.com/bot.html)"
"Mozilla/5.0 (compatible; Sosospider/2.0; +
http://help.soso.com/webspider.htm)"
"yacybot (freeworld/global; amd64 Linux 3.2.0-36-generic; java 1.6.0_27;
"OAIHarvesterObj 31 University of Illinois Library"
"PKPHarvester/2.x"
"OAI Harvester/1.0; FS Consulting, Inc."
"Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"
"Typhoeus - https://github.com/typhoeus/typhoeus"
....

cheers
stuart