Print

Print


> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> David Fiander
> Sent: Wednesday, September 02, 2009 9:32 AM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] FW: PURL Server Update 2
> 
> If Millenium is acting like a robot in its
> monthly maintenance processes, then it should be checking robots.txt.

User agents are *not required* to check robots.txt nor are servers
*required* to provide a robots.txt.  There are no expectations for
robots.txt other than a gentlemen's agreement that if a server
provides one it should be consulted when any content is access from 
the server.

However, if you have publicly accessible URIs it is highly unlikely
that you would restrict access to those of URIs in your robots.txt.
It kink-of defeats the purpose of the URIs being *public*.  You
might put those URIs in robots.txt when the URIs have been deprecated 
and are being redirected to another URI, e.g., you redesigned your
Web site, but 1) I would argue that it would be better for your user
agents to see the redirect so they can update themselves, and 2) GPO
is running a PURL server, where the URIs are suppose to be *permanent*
and *publicly* accessible.

Robots.txt is a nice idea, but if you are having an issue with a user
agent, the network folks will most likely update the router rules to 
block the traffic rather than let it get thru to the server.


<http://www.robotstxt.org/orig.html>


Andy.