Print

Print


> In any case, the trick in my mind is how to represent MARC in JSON
> (disclaimer: I haven't tried to do this yet). Breaking it into pieces that
> index well but which also can be recombined without going through
> contortions doesn't sound easy because the obvious solution of converting
> each field into an object strikes me as more awkward than it should be. My
> gut reaction would be to store the entire MARC record in  MARCXML, and
> normalize and index field values to facilitate search/retrieval.
> 
> JSON maybe a great data exchange format,  but it's not a markup language
> like XML so doing things like preserving  field order or just getting a
> bird's eye view of content across multiple fields or subfields becomes more
> complex.

This is exactly my feeling — and I've been struggling with the same idea of "storing MARC" in the context of a NoSQL-type (or wide-column or BigTable, or...) implementation.  I think this runs squarely up against the data structure-vs-serialization issue [1] — MARC being indelibly fused, which is limiting.

As someone who's spent a decade learning, defending, and loving the intricacies of all that is XML (and XPath, and XSLT, and XSL-FO, and XLink, and…), I used to snub JSON because of the things that Kyle mentions.  But JSON is not XML.  It's a simpler data structure, and in many ways that can be very freeing.

For example, the DCTERMS element set can be represented as a hierarchy from the original DCMES (though nobody seems to do this).  The fact that DC is data-structure-agnostic means that it can be stored in either XML or JSON equally well (and, with some common practice, serialized between the two), based on your needs.  You can do this precisely because the data model is format-independent.  You can't do this easily (or, possibly, at all) with MARC.

Sometimes, worse is better[2].  But, hey, I always catch flack for dissing MARC.  ;-)

MJ

PS.  For those RDF-ites among us, I also happen to think that JSON makes a great data structure for a triple store, eg. [3] — but I think storing absolute URLs as predicates like the N2 spec does is stupid.  

1. http://robotlibrarian.billdueber.com/data-structures-and-serializations/
2. http://en.wikipedia.org/wiki/Worse_is_better
3. http://n2.talis.com/wiki/RDF_JSON_Specification