Godmar Back wrote:
> ps: the distribution of the full text availability for the sample
> considered was as follows:
>
> No preview: 797 (93.5%)
> > For 1000 randomly drawn ISBN from 3,192,809 ISBN extracted from a
> > snapshot of LoC's records [2], Google Books returned results for 852
> > ISBN.
> > I found the results (85.2% recall and >99% precision, if you allow for
> > the ISBN substitution; with a 3.1% margin of error) surprisingly high.
> >
> > [2] http://www.archive.org/details/marc_records_scriblio_net
But doesn't "no preview" mean books that Google haven't scanned?
If Google had downloaded [2] and incorporated the bibliographic
records in their collection, then the recall would have gone from
85 to 100 %. How impressive is that really?
I'm prepared to be impressed if they have indeed scanned books for
6.5% of all ISBNs in the Library of Congress. But that's not
really 85% recall.
--
Lars Aronsson ([log in to unmask])
Aronsson Datateknik - http://aronsson.se
|