LISTSERV 16.5 - CODE4LIB Archives

On Wed, Sep 25, 2013 at 08:34:33AM +0100, dasos ili wrote:
> And my biggest question is how one with such a template could import
> the "logic", the rules behind a format. For instance how you handle a
> query that according to the format says: if the date of publication is
> not stored in the i.e. --1 field, then get it from the --2 field...

I don't know about querying MongoDB or elasticsearch (and i guess those
solutions are way easier when you need simple queries with no
transformation).

The good of my solution is to use perl (and the MARC::MIR DSL which is
just perl functions) so i can have all the power of the language.
The idea is to parse ISO2709 directly (you have some exemples in the
documentation). When i repeatidly need access to a big amount of data, i
also build index files with seek offsets.

About your exemple: imagine you have dates that are stored in order of
preference in 645$c or then 653$z. You can write this query as a sub:

use MARC::MIR;
use Modern::Perl;

sub find_dates (_) {
    my $record = shift;
    my @dates = map_values { $_ } [qw< 645 c >], $record;
    # return the dates found dates in 645$c
    return @dates if @dates;
    # else, return those found in 645$c
    map_values { $_ } [qw< 653 z >], $record ;
}

marawk {
    my @dates = find_dates;
    @dates and say record_id, ": ", join ', ', @dates;
} "yourfile.mrc"

another good thing about MARC::MIR is it's easy to write test: here is
the script i just tested (easy to translate with a test suite).

#! /usr/bin/perl
use Modern::Perl;
use YAML;

sub find_dates (_) {
    my $record = shift;
    my @dates = map_values { $_ } [qw< 645 c >], $record;
    # return the dates found dates in 645$c
    return @dates if @dates;
    # else, return those found in 645$c
    map_values { $_ } [qw< 653 z >], $record ;
}

for
( [ header =>
    [ [ '001' => 1 ]
    , [ 645 => [[c => "1976/01/14" ]] ]
    , [ 653 => [[z => "JUNK" ]] ]
    ] ]
, [ header =>
    [ [ '001' => 2 ]
    , [ 645 => [[a => "JUNK" ]] ]
    , [ 653 => [[z => "1976/01/14" ]] ]
    ] ]
, [ header =>
    [ [ '001' => 3 ]
    , [ 645 => [[a => "JUNK" ]] ]
    , [ 653 => [[k => "JUNK" ]] ]
    ] ]
)  { say record_id, ": ", join ', ', find_dates }

If you plane to store the data in a mongoDB, i recommend
MARC::MIR::Template to transform data. I wrote an exemple (actually i
copy/pasted it from the test suite) previously:

here is your MARC

- [001, PPNxxxx ]
- [200, [ [a, Doe], [b, john], [b, elias], [b, frederik] ]]
- [200, [ [a, Doe], [b, jane] ]]
- [300, [ [a, "i can haz title"], [b, "also subs"] ]] 

here is a template you can store un a yaml file:

001: id
200: [ authors, { a: name, b: firstname } ]
300: { a: title, b: subtitle }
700: [ auth_author, { a: name, b: firstname } ]
701: [ auth_author, { a: name, b: firstname } ]

The structure you can store in mongoDB is:


authors:
    - { name: Doe, firstname: [john, elias, frederik] }
    - { name: Doe, firstname: jane }
title: "i can haz title"
subtitle: "also subs"
id: PPNxxxx 

so you can use JSONPath:

$.title
$.authors[].name 

regards


-- 
Marc Chantreux
Université de Strasbourg, Direction Informatique
14 Rue René Descartes,
67084  STRASBOURG CEDEX
☎: 03.68.85.57.40
http://unistra.fr
"Don't believe everything you read on the Internet"
    -- Abraham Lincoln