LISTSERV 16.5 - CODE4LIB Archives

Reads in MarcXML and stores records in DB. Deduplication, edit records. 
export records by hand or cronjob.

http://sourceforge.net/projects/bibnet/

It's a specialized application for journal article, but may work for any 
kind of MarcXML with some restrictions.

Can handle large sets of data.

Markus Fischer

Am 09.09.2013 05:00, schrieb CODE4LIB automatic digest system:
> There are 2 messages totaling 79 lines in this issue.
>
> Topics of the day:
>
>    1. XML split and transform in Java (2)
>
> ----------------------------------------------------------------------
>
> Date:    Sun, 8 Sep 2013 16:22:22 +0000
> From:    Tod Olson <[log in to unmask]>
> Subject: XML split and transform in Java
>
> code4lib,
>
> I'm looking for some advice on splitting and transforming XML data using Java. The context is writing a mixin for SolrMARC to enhance our bib data, bringing in table of contents and summary data. The data is in XML, isomorphic to MARCXML. I need to split it up, transform it, and store it for use at import time. I expect the input XML to be up to a few GB, so slurping the whole thing into a DOM seems questionable. I've done one implementation for a split-only version of the problem, but the transform requirement is causing me to re-think.
>
> And maybe someone out there has already done this exact thing.
>
> I think the basic approach is to read a record from start tag to end tag, and create a reader/stream/whatever to hand exactly that record to the transform API. Lots of options for this: SAX, StAX events, or what have you. Any thoughts of what seems the most straightforward for this split-and-transform scenario would be welcome.
>
> On a related note, any thoughts on your favorite light-weight key/value pair persistent storage for Java would be welcome. I expect the data to be a little large for a serialized HashMap.
>
> Best,
>
> -Tod
>
>
> Tod Olson <[log in to unmask]>
> Systems Librarian
> University of Chicago Library
>
> ------------------------------
>
> Date:    Sun, 8 Sep 2013 20:22:24 +0200
> From:    Chris Fitzpatrick <[log in to unmask]>
> Subject: Re: XML split and transform in Java
>
> Hi,
>
> Would something like this work?
>
> https://github.com/marc4j/marc4j/blob/master/src/org/marc4j/samples/StylesheetChainExample.java
>
>
>
> On Sun, Sep 8, 2013 at 6:22 PM, Tod Olson <[log in to unmask]> wrote:
>
>> code4lib,
>>
>> I'm looking for some advice on splitting and transforming XML data using
>> Java. The context is writing a mixin for SolrMARC to enhance our bib data,
>> bringing in table of contents and summary data. The data is in XML,
>> isomorphic to MARCXML. I need to split it up, transform it, and store it
>> for use at import time. I expect the input XML to be up to a few GB, so
>> slurping the whole thing into a DOM seems questionable. I've done one
>> implementation for a split-only version of the problem, but the transform
>> requirement is causing me to re-think.
>>
>> And maybe someone out there has already done this exact thing.
>>
>> I think the basic approach is to read a record from start tag to end tag,
>> and create a reader/stream/whatever to hand exactly that record to the
>> transform API. Lots of options for this: SAX, StAX events, or what have
>> you. Any thoughts of what seems the most straightforward for this
>> split-and-transform scenario would be welcome.
>>
>> On a related note, any thoughts on your favorite light-weight key/value
>> pair persistent storage for Java would be welcome. I expect the data to be
>> a little large for a serialized HashMap.
>>
>> Best,
>>
>> -Tod
>>
>>
>> Tod Olson <[log in to unmask]>
>> Systems Librarian
>> University of Chicago Library
>>
>
> ------------------------------
>
> End of CODE4LIB Digest - 7 Sep 2013 to 8 Sep 2013 (#2013-231)
> *************************************************************
>