On Dec 13, 2007, at 9:48 AM, Ed Summers wrote:
> use Net::OAI::Harvester;
> use MODSHandler;
>
> my $url = 'http://memory.loc.gov/cgi-bin/oai2_0';
> my $harvester = Net::OAI::Harvester->new(baseURL => $url);
> my $records = $harvester->listRecords(
> metadataPrefix => 'mods',
> metadataHandler => 'MODSHandler'
> );
>
> while ($record = $records->next()) {
> print $record->metadata()->title(), "\n";
> }
>
> ...
>
> Interestingly back in 2000 or whatever when this was written it felt
> like pretty state of the art to use filters in this way. But today it
> seems kind of overkill to have to write a state-machine just to get at
> some XML. The ruby oai library [2] I worked on more recently kind of
> bucks the trend of not trying to create fancy objects for records and
> hand waving memory concerns (which never seemed to surface) and just
> returns back what amounts to a DOM and lets the user figure out what
> they want.
What type(s) of data are methods applied against the metadata method
(above) expected to return? Only scalars? How about objects? How
about other Perl data structures like a hash (of hashes)? Are there a
pre-defined set of methods that can be called against the metadata
method?
I suppose the afore mentioned MODSHandler can be designed to support
any number of methods returning different types of data. Correct?
For example, the code above is designed to return a title. Additional
methods might return authors, subjects, publishers, etc.
Spurned on by the availability of MBooks from the University of
Michigan [1], I have written the beginnings of a SAX filter for
MARCXML data. Currently it iterates over MARCXML, parses the data,
and prints to STDOUT something looking like a MARC tagged display.
Ironically, this was rather easy because MARCXML only has a limited
number of elements: leader, controlfield, datafield, and subfield.
Using Ed's code as a model, I think I could create a method called
MARC that returns a MARC::Record object, like this:
use Net::OAI::Harvester;
use MARCXML;
my $url = 'http://memory.loc.gov/cgi-bin/oai2_0';
my $harvester = Net::OAI::Harvester->new( baseURL => $url );
my $records = $harvester->listRecords(
metadataPrefix => 'marc21',
metadataHandler => 'MARCXML'
);
while ( $record = $records->next ) {
# call the MARC method returning a MARC::Record object
$marc = $record->metadata()->MARC, "\n";
# apply cool MARC::Record methods against the object
print $marc->title;
}
Alternatively, I suppose I could create methods like this:
$leader = $record->metadata()->leader;
$control = $record->metadata()->control;
$title = $record->metadata()->datafield( '245', 'a' );
$author = $record->metadata()->datafield( '100', 'a' );
$url = $record->metadata()->datafield( '856', 'u' );
Is this approach a good idea? On the other hand, maybe I should
return the whole record in all of its MARC glory. Which approach is
better? Maybe I should do both? Maybe I should return a DOM as Ed
alludes to. Ah, the choices!
[1] http://lists.webjunction.org/wjlists/xml4lib/2007-December/
005978.html
--
Eric Lease Morgan
University Libraries of Notre Dame
|