On Wed, Sep 25, 2013 at 08:34:33AM +0100, dasos ili wrote: > And my biggest question is how one with such a template could import > the "logic", the rules behind a format. For instance how you handle a > query that according to the format says: if the date of publication is > not stored in the i.e. --1 field, then get it from the --2 field... I don't know about querying MongoDB or elasticsearch (and i guess those solutions are way easier when you need simple queries with no transformation). The good of my solution is to use perl (and the MARC::MIR DSL which is just perl functions) so i can have all the power of the language. The idea is to parse ISO2709 directly (you have some exemples in the documentation). When i repeatidly need access to a big amount of data, i also build index files with seek offsets. About your exemple: imagine you have dates that are stored in order of preference in 645$c or then 653$z. You can write this query as a sub: use MARC::MIR; use Modern::Perl; sub find_dates (_) { my $record = shift; my @dates = map_values { $_ } [qw< 645 c >], $record; # return the dates found dates in 645$c return @dates if @dates; # else, return those found in 645$c map_values { $_ } [qw< 653 z >], $record ; } marawk { my @dates = find_dates; @dates and say record_id, ": ", join ', ', @dates; } "yourfile.mrc" another good thing about MARC::MIR is it's easy to write test: here is the script i just tested (easy to translate with a test suite). #! /usr/bin/perl use Modern::Perl; use YAML; sub find_dates (_) { my $record = shift; my @dates = map_values { $_ } [qw< 645 c >], $record; # return the dates found dates in 645$c return @dates if @dates; # else, return those found in 645$c map_values { $_ } [qw< 653 z >], $record ; } for ( [ header => [ [ '001' => 1 ] , [ 645 => [[c => "1976/01/14" ]] ] , [ 653 => [[z => "JUNK" ]] ] ] ] , [ header => [ [ '001' => 2 ] , [ 645 => [[a => "JUNK" ]] ] , [ 653 => [[z => "1976/01/14" ]] ] ] ] , [ header => [ [ '001' => 3 ] , [ 645 => [[a => "JUNK" ]] ] , [ 653 => [[k => "JUNK" ]] ] ] ] ) { say record_id, ": ", join ', ', find_dates } If you plane to store the data in a mongoDB, i recommend MARC::MIR::Template to transform data. I wrote an exemple (actually i copy/pasted it from the test suite) previously: here is your MARC - [001, PPNxxxx ] - [200, [ [a, Doe], [b, john], [b, elias], [b, frederik] ]] - [200, [ [a, Doe], [b, jane] ]] - [300, [ [a, "i can haz title"], [b, "also subs"] ]] here is a template you can store un a yaml file: 001: id 200: [ authors, { a: name, b: firstname } ] 300: { a: title, b: subtitle } 700: [ auth_author, { a: name, b: firstname } ] 701: [ auth_author, { a: name, b: firstname } ] The structure you can store in mongoDB is: authors: - { name: Doe, firstname: [john, elias, frederik] } - { name: Doe, firstname: jane } title: "i can haz title" subtitle: "also subs" id: PPNxxxx so you can use JSONPath: $.title $.authors[].name regards -- Marc Chantreux Université de Strasbourg, Direction Informatique 14 Rue René Descartes, 67084 STRASBOURG CEDEX ☎: 03.68.85.57.40 http://unistra.fr "Don't believe everything you read on the Internet" -- Abraham Lincoln