On Wed, Sep 25, 2013 at 08:34:33AM +0100, dasos ili wrote:
> And my biggest question is how one with such a template could import
> the "logic", the rules behind a format. For instance how you handle a
> query that according to the format says: if the date of publication is
> not stored in the i.e. --1 field, then get it from the --2 field...
I don't know about querying MongoDB or elasticsearch (and i guess those
solutions are way easier when you need simple queries with no
transformation).
The good of my solution is to use perl (and the MARC::MIR DSL which is
just perl functions) so i can have all the power of the language.
The idea is to parse ISO2709 directly (you have some exemples in the
documentation). When i repeatidly need access to a big amount of data, i
also build index files with seek offsets.
About your exemple: imagine you have dates that are stored in order of
preference in 645$c or then 653$z. You can write this query as a sub:
use MARC::MIR;
use Modern::Perl;
sub find_dates (_) {
my $record = shift;
my @dates = map_values { $_ } [qw< 645 c >], $record;
# return the dates found dates in 645$c
return @dates if @dates;
# else, return those found in 645$c
map_values { $_ } [qw< 653 z >], $record ;
}
marawk {
my @dates = find_dates;
@dates and say record_id, ": ", join ', ', @dates;
} "yourfile.mrc"
another good thing about MARC::MIR is it's easy to write test: here is
the script i just tested (easy to translate with a test suite).
#! /usr/bin/perl
use Modern::Perl;
use YAML;
sub find_dates (_) {
my $record = shift;
my @dates = map_values { $_ } [qw< 645 c >], $record;
# return the dates found dates in 645$c
return @dates if @dates;
# else, return those found in 645$c
map_values { $_ } [qw< 653 z >], $record ;
}
for
( [ header =>
[ [ '001' => 1 ]
, [ 645 => [[c => "1976/01/14" ]] ]
, [ 653 => [[z => "JUNK" ]] ]
] ]
, [ header =>
[ [ '001' => 2 ]
, [ 645 => [[a => "JUNK" ]] ]
, [ 653 => [[z => "1976/01/14" ]] ]
] ]
, [ header =>
[ [ '001' => 3 ]
, [ 645 => [[a => "JUNK" ]] ]
, [ 653 => [[k => "JUNK" ]] ]
] ]
) { say record_id, ": ", join ', ', find_dates }
If you plane to store the data in a mongoDB, i recommend
MARC::MIR::Template to transform data. I wrote an exemple (actually i
copy/pasted it from the test suite) previously:
here is your MARC
- [001, PPNxxxx ]
- [200, [ [a, Doe], [b, john], [b, elias], [b, frederik] ]]
- [200, [ [a, Doe], [b, jane] ]]
- [300, [ [a, "i can haz title"], [b, "also subs"] ]]
here is a template you can store un a yaml file:
001: id
200: [ authors, { a: name, b: firstname } ]
300: { a: title, b: subtitle }
700: [ auth_author, { a: name, b: firstname } ]
701: [ auth_author, { a: name, b: firstname } ]
The structure you can store in mongoDB is:
authors:
- { name: Doe, firstname: [john, elias, frederik] }
- { name: Doe, firstname: jane }
title: "i can haz title"
subtitle: "also subs"
id: PPNxxxx
so you can use JSONPath:
$.title
$.authors[].name
regards
--
Marc Chantreux
Université de Strasbourg, Direction Informatique
14 Rue René Descartes,
67084 STRASBOURG CEDEX
☎: 03.68.85.57.40
http://unistra.fr
"Don't believe everything you read on the Internet"
-- Abraham Lincoln
|