Print

Print


Hi Michael,

Hmm... well that doesn't seem right at all, does it?  Thank you for
pointing it out, I've sent this along to our petabox team to see if they
can put up the correct error codes.

Alexis


Klein, Michael wrote:
> Peter,
>
> I've seen no official information or documentation from the Internet Archive
> either. I've actually been quite frustrated by several issues for a while
> now. For example: If you go to
> http://www.archive.org/details/nonexistentidentifier you'll get a
> human-readable web page stating that the item cannot be found. That page,
> however, is served up with an HTTP status of 200 OK, not 404 NOT FOUND.
>
> In addition, I've noticed that when certain requests fail due to system load
> and other issues, I get back an HTML page saying something like "the system
> is experiencing slowness," but again with a 200 OK instead of a 503 SERVICE
> UNAVAILABLE (ideally with a Retry-After header).
>
> These things alone make it extremely difficult to automate any large-scale
> metadata retrieval from the Internet Archive, and that's without any attempt
> to download content.
>
> I'm working on a post documenting some of the techniques and strategies that
> have worked for us, but it's not quite ready for human consumption yet.
>
> Michael
>
> --
> Michael B. Klein
> Digital Initiatives Technology Librarian
> Boston Public Library
> [log in to unmask]
>
>
>
>> From: "Binkley, Peter" <[log in to unmask]>
>> Reply-To: "Code for Libraries <[log in to unmask]>"
>> <[log in to unmask]>
>> Date: Thu, 5 Jun 2008 13:08:13 -0600
>> To: <[log in to unmask]>
>> Conversation: [CODE4LIB] Internet Archive collection codes?
>> Subject: Re: [CODE4LIB] Internet Archive collection codes?
>>
>> While we're on the subject, are there any more up-to-date instructions
>> for harvesting from Internet Archive than these?
>> http://biodiversitylibrary.blogspot.com/2008/03/harvesting-process-from-
>> internet_14.html
>>
>> And does IA provide guidelines for harvesting (traffic limits etc.)? I
>> clicked around the site a bit and didn't find them, but could easily
>> have missed them.
>>
>> Peter
>>