Print

Print


Thank you all for the replies.

To summarize:

- Tim Spalding offered LibraryThing's database at
http://www.librarything.com/wiki/index.php/LibraryThing_APIs
- Roy Tennant pointed at MIT's Barton dump: available at
<http://simile.mit.edu/rdf-test-data/>

but the winner is probably this python script based on Ed's suggestion:

-----
#!/usr/bin/python

from urllib import urlopen
from pymarc import MARCReader

locrecordspattern =
'http://www.archive.org/download/marc_records_scriblio_net/part%02d.dat'

for part in range(1, 30):
    for record in MARCReader(urlopen(locrecordspattern % part)):
        if record['020'] and record['020']['a']:
            print record['020']['a']
------

Now if I could only figure out how to install "easy_install" on FC8 so
I didn't have to run it with:
env PYTHONPATH=`pwd`/pymarc-2.21 ./readloc.py

 - Godmar

On Tue, Apr 29, 2008 at 8:20 AM, Ed Summers <[log in to unmask]> wrote:
> You could download a snapshot of the full LC back file at the Internet
>  Archive (kindly donated by Scriblio).
>
>   http://www.archive.org/details/marc_records_scriblio_net
>
>  Then run a script using your favorite MARC parsing library (mine
>  currently is pymarc):
>
>   from pymarc import MARCReader
>
>   for record in MARCReader(file('part01.dat')):
>       if record['020'] and record['020']['a']:
>           print record['020']['a']
>
>  //Ed
>
>
>
>  On Mon, Apr 28, 2008 at 9:35 AM, Godmar Back <[log in to unmask]> wrote:
>  > Hi,
>  >
>  >  for an investigation/study, I'm looking to obtain a representative
>  >  sample set (say a few hundreds) of ISBNs. For instance, the sample
>  >  could represent LoC's holdings (or some other acceptable/meaningful
>  >  population in the library world).
>  >
>  >  Does anybody have any pointers/ideas on how I might go about this?
>  >
>  >  Thanks!
>  >
>  >   - Godmar
>  >
>