Jonathan,
Amazon.com doesn't seem to allow HEAD requests -- it returns a 405 METHOD
NOT ALLOWED status. What's more, GET responses don't seem to include
Content-Length headers.
One thing I've noticed, though is that the "unavailable" response doesn't
include a <title> element, while the regular reader does. You may be able to
come up with a way to make that quicker and more reliable than grepping the
full text.
Michael
--
Michael B. Klein
Digital Initiatives Technology Librarian
Boston Public Library
(617) 859-2391
[log in to unmask]
> From: Jonathan Rochkind <[log in to unmask]>
> Reply-To: "Code for Libraries <[log in to unmask]>"
> <[log in to unmask]>
> Date: Fri, 27 Jun 2008 12:00:54 -0400
> To: <[log in to unmask]>
> Subject: Re: [CODE4LIB] Amazon Web Services and search-inside-the-book
>
> Excellent, thanks Charles.
>
> I can tell you that my technique seems to be working fine, if you want
> to try it too.
>
> Construct a URL:
>
> http://www.amazon.com/gp/reader/ASIN
>
> Requset the URL. Grep the response for "book is temporarily
> unavailable"--if you get it, there's no search inside the book. If you
> don't get it, there is search inside the book. (Sadly, it's still a 200
> HTTP status in response, either way).
>
> I want to look at if I can just do a HEAD request and tell the
> difference between presence and absence of search inside by the
> advertised length of the response. That's Terry Reese's preferred way of
> doing a check for legitimate content at the end of a URL, trying to
> guess from content length with just a HEAD request. Not sure if that
> will work here or not. Would potentially be somewhat more efficient if
> it would.
>
> Jonathan
> --
> Jonathan Rochkind
> Digital Services Software Engineer
> The Sheridan Libraries
> Johns Hopkins University
> 410.516.8886
> rochkind (at) jhu.edu
|