LISTSERV 16.5 - CODE4LIB Archives

JSON++

I routinely re-index about 2.5M JSON records (originally from binary MARC), and it's several orders of magnitude faster than XML (measured in single-digit minutes rather than double-digit hours).  I'm not sure if it's in the same range as binary MARC, but as Tim says, it's plenty fast enough for pragmatic purposes.

Unfortunately JSON doesn't have as many mature tools for manipulation as XML (yet?), but I'd be inclined to call it the best of both worlds rather than a middle-ground or compromise.

MJ

> Marc in JSON can be a nice middle-ground, faster/smaller than MarcXML (although still probably not as binary), based on a standard low-level data format so easier to work with using existing tools (and developers eyes) than binary, no maximum record length. 
> There have been a couple competing attempts to define a marc-expressed-in-json 'standard', none have really caught on yet. I like Ross's latest attempt:  http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/
> 
> Patrick Hochstenbach wrote:
>> Dear Nate,
>> 
>> There is a trade-off: do you want very fast processing of data -> go for binary data. do you want to share your data globally easily in many (not per se library related) environments -> go for XML/RDF. Open your data and do both :-)
>> 
>> Pat
>> 
>> Sent from my iPhone
>> 
>> On 25 Oct 2010, at 20:39, "Nate Vack" <[log in to unmask]> wrote:
>> 
>>  
>>> Hi all,
>>> 
>>> I've just spent the last couple of weeks delving into and decoding a
>>> binary file format. This, in turn, got me thinking about MARCXML.
>>> 
>>> In a nutshell, it looks like it's supposed to contain the exact same
>>> data as a normal MARC record, except in XML form. As in, it should be
>>> round-trippable.
>>> 
>>> What's the advantage to this? I can see using a human-readable format
>>> for poorly-documented file formats -- they're relatively easy to read
>>> and understand. But MARC is well, well-documented, with more than one
>>> free implementation in cursory searching. And once you know a binary
>>> file's format, it's no harder to parse than XML, and the data's
>>> smaller and processing faster.
>>> 
>>> So... why the XML?
>>> 
>>> Curious,
>>> -Nate
>>>    
>> 
>>