Print

Print


Eh, I'm still intuitively opposed to pull parsing. Okay, so there are 
some useful libraries these days.... if you are using the right 
language. If you're using ruby and don't want to use native C code?  
Just as an example. Seems like we want to arrive at something easy 
enough to interpret _anywhere_, since you never know where you'll need 
to process marc.

Simply reading a list of json records one at a time -- seems like it's 
not too much to ask for a solution that does not require complicated 
code that only has been written for some platforms. This does not seem 
like a complicated enough problem that you have to resort to complicated 
solutions like json pull parsers.

newline-delimited is certainly one simple solution, even though the 
aggregate file is not valid JSON. Does it matter?  Not sure if there are 
any simple solutions that still give you valid JSON, but if there 
aren't, I'd rather sacrifice valid JSON (that it's unclear if there's 
any important use case for anyway), than sacrifice simplicity.

On 12/1/2011 2:47 PM, Bill Dueber wrote:
> I was a strong proponent of NDJ at one point, but I've grown less strident
> and more weary since then.
>
> Brad Baxter has a good overview of some options[1]. I'm assuming it's a
> given we'd all prefer to work with valid JSON files if the pain-point can
> be brought down far enough.
>
> A couple years have passed since we first talked about this stuff, and the
> state of JSON pull-parsers is better than it once was:
>
>    * yajl[2] is a super-fast C library for parsing json and support stream
> parsing. Bindings for ruby, node, python, and perl are linked to off the
> home page. I found one PHP binding[3] on github which is broken/abandoned,
> and no other pull-parser for PHP that I can find. Sadly, the ruby wrapper
> doesn't actually expose the callbacks necessary for pull-parsing, although
> there is a pull request[4] and at least one other option[5].
>    * Perl's JSON::XS supports incremental parsing
>    * the Jackson java library[6] is excellent and has an easy-to-use
> pull-parser. There are a few simplistic efforts to wrap it for jruby/jython
> use as well.
>
> Pull-parsing is ugly, but no longer astoundingly difficult or slow, with
> the possible exception of PHP. And output is simple enough.
>
> As much as it makes me shudder, I think we're probably better off trying to
> do pull parsers and have a marc-in-json document be a valid JSON array.
>
> We could easily adopt a *convention* of, essentially, one-record-per-line,
> but wrap it in '[]' to make it valid json. That would allow folks with a
> pull-parser to write a real streaming reader, and folks without to "cheat"
> (ditch the leading and trailing [], and read the rest as
> one-record-per-line) until such a time as they can start using a more
> full-featured json parser.
>
> 1.
> http://en.wikipedia.org/wiki/User:Baxter.brad/Drafts/JSON_Document_Streaming_Proposal
> 2. http://lloyd.github.com/yajl/
> 3. https://github.com/sfalvo/php-yajl
> 4. https://github.com/brianmario/yajl-ruby/pull/50
> 5. http://dgraham.github.com/json-stream/
> 6. http://wiki.fasterxml.com/JacksonHome
>
>
>
> On Thu, Dec 1, 2011 at 12:56 PM, Michael B. Klein<[log in to unmask]>  wrote:
>
>> +1 to marc-in-json
>> +1 to newline-delimited records
>> +1 to read support
>> +1 to edsu, rsinger, BillDueber, gmcharlt, and the other module maintainers
>>
>> On Thu, Dec 1, 2011 at 9:31 AM, Keith Jenkins<[log in to unmask]>  wrote:
>>
>>> On Thu, Dec 1, 2011 at 11:56 AM, Gabriel Farrell<[log in to unmask]>
>>> wrote:>  I suspect newline-delimited will win this race.
>>> Yes.  Everyone please cast a vote for newline-delimited JSON.
>>>
>>> Is there any consensus on the appropriate mime type for ndj?
>>>
>>> Keith
>>>
>
>