Reads in MarcXML and stores records in DB. Deduplication, edit records. export records by hand or cronjob. http://sourceforge.net/projects/bibnet/ It's a specialized application for journal article, but may work for any kind of MarcXML with some restrictions. Can handle large sets of data. Markus Fischer Am 09.09.2013 05:00, schrieb CODE4LIB automatic digest system: > There are 2 messages totaling 79 lines in this issue. > > Topics of the day: > > 1. XML split and transform in Java (2) > > ---------------------------------------------------------------------- > > Date: Sun, 8 Sep 2013 16:22:22 +0000 > From: Tod Olson <[log in to unmask]> > Subject: XML split and transform in Java > > code4lib, > > I'm looking for some advice on splitting and transforming XML data using Java. The context is writing a mixin for SolrMARC to enhance our bib data, bringing in table of contents and summary data. The data is in XML, isomorphic to MARCXML. I need to split it up, transform it, and store it for use at import time. I expect the input XML to be up to a few GB, so slurping the whole thing into a DOM seems questionable. I've done one implementation for a split-only version of the problem, but the transform requirement is causing me to re-think. > > And maybe someone out there has already done this exact thing. > > I think the basic approach is to read a record from start tag to end tag, and create a reader/stream/whatever to hand exactly that record to the transform API. Lots of options for this: SAX, StAX events, or what have you. Any thoughts of what seems the most straightforward for this split-and-transform scenario would be welcome. > > On a related note, any thoughts on your favorite light-weight key/value pair persistent storage for Java would be welcome. I expect the data to be a little large for a serialized HashMap. > > Best, > > -Tod > > > Tod Olson <[log in to unmask]> > Systems Librarian > University of Chicago Library > > ------------------------------ > > Date: Sun, 8 Sep 2013 20:22:24 +0200 > From: Chris Fitzpatrick <[log in to unmask]> > Subject: Re: XML split and transform in Java > > Hi, > > Would something like this work? > > https://github.com/marc4j/marc4j/blob/master/src/org/marc4j/samples/StylesheetChainExample.java > > > > On Sun, Sep 8, 2013 at 6:22 PM, Tod Olson <[log in to unmask]> wrote: > >> code4lib, >> >> I'm looking for some advice on splitting and transforming XML data using >> Java. The context is writing a mixin for SolrMARC to enhance our bib data, >> bringing in table of contents and summary data. The data is in XML, >> isomorphic to MARCXML. I need to split it up, transform it, and store it >> for use at import time. I expect the input XML to be up to a few GB, so >> slurping the whole thing into a DOM seems questionable. I've done one >> implementation for a split-only version of the problem, but the transform >> requirement is causing me to re-think. >> >> And maybe someone out there has already done this exact thing. >> >> I think the basic approach is to read a record from start tag to end tag, >> and create a reader/stream/whatever to hand exactly that record to the >> transform API. Lots of options for this: SAX, StAX events, or what have >> you. Any thoughts of what seems the most straightforward for this >> split-and-transform scenario would be welcome. >> >> On a related note, any thoughts on your favorite light-weight key/value >> pair persistent storage for Java would be welcome. I expect the data to be >> a little large for a serialized HashMap. >> >> Best, >> >> -Tod >> >> >> Tod Olson <[log in to unmask]> >> Systems Librarian >> University of Chicago Library >> > > ------------------------------ > > End of CODE4LIB Digest - 7 Sep 2013 to 8 Sep 2013 (#2013-231) > ************************************************************* >