On Dec 13, 2007, at 9:48 AM, Ed Summers wrote: > use Net::OAI::Harvester; > use MODSHandler; > > my $url = 'http://memory.loc.gov/cgi-bin/oai2_0'; > my $harvester = Net::OAI::Harvester->new(baseURL => $url); > my $records = $harvester->listRecords( > metadataPrefix => 'mods', > metadataHandler => 'MODSHandler' > ); > > while ($record = $records->next()) { > print $record->metadata()->title(), "\n"; > } > > ... > > Interestingly back in 2000 or whatever when this was written it felt > like pretty state of the art to use filters in this way. But today it > seems kind of overkill to have to write a state-machine just to get at > some XML. The ruby oai library [2] I worked on more recently kind of > bucks the trend of not trying to create fancy objects for records and > hand waving memory concerns (which never seemed to surface) and just > returns back what amounts to a DOM and lets the user figure out what > they want. What type(s) of data are methods applied against the metadata method (above) expected to return? Only scalars? How about objects? How about other Perl data structures like a hash (of hashes)? Are there a pre-defined set of methods that can be called against the metadata method? I suppose the afore mentioned MODSHandler can be designed to support any number of methods returning different types of data. Correct? For example, the code above is designed to return a title. Additional methods might return authors, subjects, publishers, etc. Spurned on by the availability of MBooks from the University of Michigan [1], I have written the beginnings of a SAX filter for MARCXML data. Currently it iterates over MARCXML, parses the data, and prints to STDOUT something looking like a MARC tagged display. Ironically, this was rather easy because MARCXML only has a limited number of elements: leader, controlfield, datafield, and subfield. Using Ed's code as a model, I think I could create a method called MARC that returns a MARC::Record object, like this: use Net::OAI::Harvester; use MARCXML; my $url = 'http://memory.loc.gov/cgi-bin/oai2_0'; my $harvester = Net::OAI::Harvester->new( baseURL => $url ); my $records = $harvester->listRecords( metadataPrefix => 'marc21', metadataHandler => 'MARCXML' ); while ( $record = $records->next ) { # call the MARC method returning a MARC::Record object $marc = $record->metadata()->MARC, "\n"; # apply cool MARC::Record methods against the object print $marc->title; } Alternatively, I suppose I could create methods like this: $leader = $record->metadata()->leader; $control = $record->metadata()->control; $title = $record->metadata()->datafield( '245', 'a' ); $author = $record->metadata()->datafield( '100', 'a' ); $url = $record->metadata()->datafield( '856', 'u' ); Is this approach a good idea? On the other hand, maybe I should return the whole record in all of its MARC glory. Which approach is better? Maybe I should do both? Maybe I should return a DOM as Ed alludes to. Ah, the choices! [1] http://lists.webjunction.org/wjlists/xml4lib/2007-December/ 005978.html -- Eric Lease Morgan University Libraries of Notre Dame