Print

Print


On Dec 13, 2007, at 10:33 AM, Eric Lease Morgan wrote:

>> Put another way, if I want to use repository using
>> NET::OAI::Harvester to read repository data in a
>> form other than DC will I need to write an additional
>> module such as NET::OAI::Record::MARCXML?
>
> But I'm lazy, and even though it is not the best solution, I will
> explore another option. Specifically, I will use oai_dump (which
> comes with N::O::H), change the metadata scheme from oai_dc to
> marc21, run the script, and parse the resulting XML. If I'm lucky
> my parser will able to be written as a SAX filter that can be added
> to the N::O::H distribution. In the meantime, at least I will have
> the data. Wish me luck.



After getting most of my MARCXML/SAX parser written, Ed Summmers
presented me with a couple of Perl modules allowing me to return
MARC::Record objects from the harvest of OAI repositories supporting
the marc21 metadata schema. This is originally what I wanted to do.

Using this technology I was able to harvest the metadata (MARC
records) of 70,000 University of Michigan digitized books (MBooks). I
then fed them to an indexer -- Zebra -- that reads raw MARC very
well, and provided a rudimentary interface to the index via SRU:

   http://infomotions.com/ii/

In the end the process was almost trivial and can easily be expanded
to include other types of content.

Thank you to all who helped along the way!

--
Eric Lease Morgan
University Libraries of Notre Dame

(574) 631-8604