Print

Print


Jonathan,

Amazon.com doesn't seem to allow HEAD requests -- it returns a 405 METHOD
NOT ALLOWED status. What's more, GET responses don't seem to include
Content-Length headers.

One thing I've noticed, though is that the "unavailable" response doesn't
include a <title> element, while the regular reader does. You may be able to
come up with a way to make that quicker and more reliable than grepping the
full text.

Michael

-- 
Michael B. Klein
Digital Initiatives Technology Librarian
Boston Public Library
(617) 859-2391
[log in to unmask]


> From: Jonathan Rochkind <[log in to unmask]>
> Reply-To: "Code for Libraries <[log in to unmask]>"
> <[log in to unmask]>
> Date: Fri, 27 Jun 2008 12:00:54 -0400
> To: <[log in to unmask]>
> Subject: Re: [CODE4LIB] Amazon Web Services and search-inside-the-book
> 
> Excellent, thanks Charles.
> 
> I can tell you that my technique seems to be working fine, if you want
> to try it too.
> 
> Construct a URL:
> 
> http://www.amazon.com/gp/reader/ASIN
> 
> Requset the URL.  Grep the response for "book is temporarily
> unavailable"--if you get it, there's no search inside the book. If you
> don't get it, there is search inside the book. (Sadly, it's still a 200
> HTTP status in response, either way).
> 
> I want to look at if I can just do a HEAD request and tell the
> difference between presence and absence of search inside by the
> advertised length of the response. That's Terry Reese's preferred way of
> doing a check for legitimate content at the end of a URL, trying to
> guess from content length with just a HEAD request. Not sure if that
> will work here or not. Would potentially be somewhat more efficient if
> it would.
> 
> Jonathan

> -- 
> Jonathan Rochkind
> Digital Services Software Engineer
> The Sheridan Libraries
> Johns Hopkins University
> 410.516.8886 
> rochkind (at) jhu.edu