Print

Print


Hey Eric:

N::O::H uses XML::SAX for XML parsing, which provides a standard
interface to multiple back end XML parsers, and also provides a
facility known as XML Filters [1].

Net::OAI::Record::OAI_DC is an example of a SAX filter which receives
SAX events for each metadata record in a response and builds up a
representation of the record. Since oai_dc is standard in oai-pmh-land
it's assumed as a default a lot of the time.

So if you want to retrieve another kind of metadata you have to write
a SAX filter for it, and then reference it when you are calling
getRecord(), listRecords() or listAllRecords().

So for example here's a test script for a MODSHandler detailed below:

--

  use Net::OAI::Harvester;
  use MODSHandler;

  my $url = 'http://memory.loc.gov/cgi-bin/oai2_0';
  my $harvester = Net::OAI::Harvester->new(baseURL => $url);
  my $records = $harvester->listRecords(
     metadataPrefix => 'mods',
     metadataHandler => 'MODSHandler'
  );

  while ($record = $records->next()) {
      print $record->metadata()->title(), "\n";
  }

--

And here's a barely functional MODSHandler that just pulls out the title:

--

package MODSHandler;

  use XML::SAX::Base;
  use base qw(XML::SAX::Base);

  sub new {
      my $class = shift;
      return bless {inside => 0}, ref($class) || $class;
  }

  sub title {
      return shift->{title};
  }

  sub start_element {
     my ($self, $element) = @_;
     if ($element->{Name} eq 'title') {$self->{inside} = 1;}
  }

  sub end_element {
      my ($self, $element) = @_;
      if ($element->{Name} eq 'title') {$self->{inside} = 0;}
  }

  sub characters {
      my ($self, $chars) = @_;
      if ($self->{inside}) {
          $self->{title} .= $chars->{Data};
      }
  }

  1;

--

Kind of sad that there's that much code to just get at the contents of
the title element. Perhaps there are some SAX Filters on CPAN that can
build up a DOM like object for you.

Interestingly back in 2000 or whatever when this was written it felt
like pretty state of the art to use filters in this way. But today it
seems kind of overkill to have to write a state-machine just to get at
some XML. The ruby oai library [2] I worked on more recently kind of
bucks the trend of not trying to create fancy objects for records and
hand waving memory concerns (which never seemed to surface) and just
returns back what amounts to a DOM and lets the user figure out what
they want.

Let me know if you run into any trouble.

//Ed

[1] http://www.xml.com/pub/a/2001/10/10/sax-filters.html
[2] http://oai.rubyforge.org