Hi, Would something like this work? https://github.com/marc4j/marc4j/blob/master/src/org/marc4j/samples/StylesheetChainExample.java On Sun, Sep 8, 2013 at 6:22 PM, Tod Olson <[log in to unmask]> wrote: > code4lib, > > I'm looking for some advice on splitting and transforming XML data using > Java. The context is writing a mixin for SolrMARC to enhance our bib data, > bringing in table of contents and summary data. The data is in XML, > isomorphic to MARCXML. I need to split it up, transform it, and store it > for use at import time. I expect the input XML to be up to a few GB, so > slurping the whole thing into a DOM seems questionable. I've done one > implementation for a split-only version of the problem, but the transform > requirement is causing me to re-think. > > And maybe someone out there has already done this exact thing. > > I think the basic approach is to read a record from start tag to end tag, > and create a reader/stream/whatever to hand exactly that record to the > transform API. Lots of options for this: SAX, StAX events, or what have > you. Any thoughts of what seems the most straightforward for this > split-and-transform scenario would be welcome. > > On a related note, any thoughts on your favorite light-weight key/value > pair persistent storage for Java would be welcome. I expect the data to be > a little large for a serialized HashMap. > > Best, > > -Tod > > > Tod Olson <[log in to unmask]> > Systems Librarian > University of Chicago Library >