Hi,
Would something like this work?
https://github.com/marc4j/marc4j/blob/master/src/org/marc4j/samples/StylesheetChainExample.java
On Sun, Sep 8, 2013 at 6:22 PM, Tod Olson <[log in to unmask]> wrote:
> code4lib,
>
> I'm looking for some advice on splitting and transforming XML data using
> Java. The context is writing a mixin for SolrMARC to enhance our bib data,
> bringing in table of contents and summary data. The data is in XML,
> isomorphic to MARCXML. I need to split it up, transform it, and store it
> for use at import time. I expect the input XML to be up to a few GB, so
> slurping the whole thing into a DOM seems questionable. I've done one
> implementation for a split-only version of the problem, but the transform
> requirement is causing me to re-think.
>
> And maybe someone out there has already done this exact thing.
>
> I think the basic approach is to read a record from start tag to end tag,
> and create a reader/stream/whatever to hand exactly that record to the
> transform API. Lots of options for this: SAX, StAX events, or what have
> you. Any thoughts of what seems the most straightforward for this
> split-and-transform scenario would be welcome.
>
> On a related note, any thoughts on your favorite light-weight key/value
> pair persistent storage for Java would be welcome. I expect the data to be
> a little large for a serialized HashMap.
>
> Best,
>
> -Tod
>
>
> Tod Olson <[log in to unmask]>
> Systems Librarian
> University of Chicago Library
>
|