Print

Print


On Dec 13, 2007, at 9:48 AM, Ed Summers wrote:

>   use Net::OAI::Harvester;
>   use MODSHandler;
>
>   my $url = 'http://memory.loc.gov/cgi-bin/oai2_0';
>   my $harvester = Net::OAI::Harvester->new(baseURL => $url);
>   my $records = $harvester->listRecords(
>      metadataPrefix => 'mods',
>      metadataHandler => 'MODSHandler'
>   );
>
>   while ($record = $records->next()) {
>       print $record->metadata()->title(), "\n";
>   }
>
> ...
>
> Interestingly back in 2000 or whatever when this was written it felt
> like pretty state of the art to use filters in this way. But today it
> seems kind of overkill to have to write a state-machine just to get at
> some XML. The ruby oai library [2] I worked on more recently kind of
> bucks the trend of not trying to create fancy objects for records and
> hand waving memory concerns (which never seemed to surface) and just
> returns back what amounts to a DOM and lets the user figure out what
> they want.



What type(s) of data are methods applied against the metadata method
(above) expected to return? Only scalars? How about objects? How
about other Perl data structures like a hash (of hashes)? Are there a
pre-defined set of methods that can be called against the metadata
method?

I suppose the afore mentioned MODSHandler can be designed to support
any number of methods returning  different types of data. Correct?
For example, the code above is designed to return a title. Additional
methods might return authors, subjects, publishers, etc.

Spurned on by the availability of MBooks from the University of
Michigan [1], I have written the beginnings of a SAX filter for
MARCXML data. Currently it iterates over MARCXML, parses the data,
and prints to STDOUT something looking like a MARC tagged display.
Ironically, this was rather easy because MARCXML only has a limited
number of elements: leader, controlfield, datafield, and subfield.

Using Ed's code as a model, I think I could create a method called
MARC that returns a MARC::Record object, like this:

   use Net::OAI::Harvester;
   use MARCXML;

   my $url = 'http://memory.loc.gov/cgi-bin/oai2_0';
   my $harvester = Net::OAI::Harvester->new( baseURL => $url );
   my $records = $harvester->listRecords(

        metadataPrefix  => 'marc21',
        metadataHandler => 'MARCXML'

   );

   while ( $record = $records->next ) {

        # call the MARC method returning a MARC::Record object
        $marc = $record->metadata()->MARC, "\n";

        # apply cool MARC::Record methods against the object
        print $marc->title;

   }

Alternatively, I suppose I could create methods like this:

   $leader  = $record->metadata()->leader;
   $control = $record->metadata()->control;
   $title   = $record->metadata()->datafield( '245', 'a' );
   $author  = $record->metadata()->datafield( '100', 'a' );
   $url     = $record->metadata()->datafield( '856', 'u' );

Is this approach a good idea? On the other hand, maybe I should
return the whole record in all of its MARC glory. Which approach is
better? Maybe I should do both? Maybe I should return a DOM as Ed
alludes to. Ah, the choices!


[1] http://lists.webjunction.org/wjlists/xml4lib/2007-December/
005978.html

--
Eric Lease Morgan
University Libraries of Notre Dame