Andrew Houghton wrote:
> In the case of GPO, they mentioned or implied, that they were having
> contention issues with user agents hitting the server while trying to
> restore the data. This contention could be mitigated by imposing
> lower throttling limits in the router on user agents until the data is
> restored and then raising the limits back to the whatever their
> prescribed SLA (service level
> agreement) was.
The GPO tech I spoke with mentioned this contention issue explicitly.
He had just emerged from a meeting on the PURL problem (which meeting
sounded like an all-hands on deck affair). He mentioned that there had
been discussion of the contention issue in the meeting but that they had
decided not to block the offending IPs (whether because they could not
do so effectively in time or because of a philosophical issue I did not
inquire). Throttling the user agents was not mentioned to me as a
possiblity.
In fact, unless I'm mistaken, the PURL server does appear to be
completely inaccessible now, in advance of advertised downtime this
afternoon 5-7 EST
Jonathan LeBreton
Sr. Associate University Librarian
Temple University Libraries
voice: 215-204-8231
fax: 215-204-5201
email: [log in to unmask]
email: [log in to unmask]
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf
Of
> Houghton,Andrew
> Sent: Wednesday, September 02, 2009 11:27 AM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] FW: PURL Server Update 2
>
> > From: Code for Libraries [mailto:[log in to unmask]] On Behalf
> Of
> > Thomas Dowling
> > Sent: Wednesday, September 02, 2009 10:25 AM
> > To: [log in to unmask]
> > Subject: Re: [CODE4LIB] FW: PURL Server Update 2
> >
> > The III crawler has been a pain for years and Innovative has shown
no
> > interest
> > in cleaning it up. It not only ignores robots.txt, but it hits
> target
> > servers
> > just as fast and hard as it can. If you have a lot of links that a
> lot
> > of III
> > catalogs check, its behavior is indistinguishable from a DOS attack.
> (I
> > know
> > because our journals server often used to crash about 2:00am on the
> > first of
> > the month...)
>
> I see that I didn't fully make the connection to the point I was
> making... which is that there are hardware solutions to these
> issues rather than using robots.txt or sitemap.xml. If a user
> agent is a problem, then network folks should change the router
> to ignore the user agent or reduce the number of requests it is
> allowed to make to the server.
>
> In the case you point to with III hitting the server as fast as
> it can and it looking like a DOS attack to the network which
> caused the server to crash, then 1) the router hasn't been setup
> to impose throttling limits on user agents, and 2) the server
> probably isn't part of a server farm that is being load balanced.
>
> In the case of GPO, they mentioned or implied, that they were
> having contention issues with user agents hitting the server
> while trying to restore the data. This contention could be
> mitigated by imposing lower throttling limits in the router on
> user agents until the data is restored and then raising the
> limits back to the whatever their prescribed SLA (service level
> agreement) was.
>
> You really don't need to have a document on the server to tell
> user agents what to do. You can and should impose a network
> policy on user agents which is far better solution in my opinion.
>
>
> Andy.
|