Eh, I'm still intuitively opposed to pull parsing. Okay, so there are
some useful libraries these days.... if you are using the right
language. If you're using ruby and don't want to use native C code?
Just as an example. Seems like we want to arrive at something easy
enough to interpret _anywhere_, since you never know where you'll need
to process marc.
Simply reading a list of json records one at a time -- seems like it's
not too much to ask for a solution that does not require complicated
code that only has been written for some platforms. This does not seem
like a complicated enough problem that you have to resort to complicated
solutions like json pull parsers.
newline-delimited is certainly one simple solution, even though the
aggregate file is not valid JSON. Does it matter? Not sure if there are
any simple solutions that still give you valid JSON, but if there
aren't, I'd rather sacrifice valid JSON (that it's unclear if there's
any important use case for anyway), than sacrifice simplicity.
On 12/1/2011 2:47 PM, Bill Dueber wrote:
> I was a strong proponent of NDJ at one point, but I've grown less strident
> and more weary since then.
>
> Brad Baxter has a good overview of some options[1]. I'm assuming it's a
> given we'd all prefer to work with valid JSON files if the pain-point can
> be brought down far enough.
>
> A couple years have passed since we first talked about this stuff, and the
> state of JSON pull-parsers is better than it once was:
>
> * yajl[2] is a super-fast C library for parsing json and support stream
> parsing. Bindings for ruby, node, python, and perl are linked to off the
> home page. I found one PHP binding[3] on github which is broken/abandoned,
> and no other pull-parser for PHP that I can find. Sadly, the ruby wrapper
> doesn't actually expose the callbacks necessary for pull-parsing, although
> there is a pull request[4] and at least one other option[5].
> * Perl's JSON::XS supports incremental parsing
> * the Jackson java library[6] is excellent and has an easy-to-use
> pull-parser. There are a few simplistic efforts to wrap it for jruby/jython
> use as well.
>
> Pull-parsing is ugly, but no longer astoundingly difficult or slow, with
> the possible exception of PHP. And output is simple enough.
>
> As much as it makes me shudder, I think we're probably better off trying to
> do pull parsers and have a marc-in-json document be a valid JSON array.
>
> We could easily adopt a *convention* of, essentially, one-record-per-line,
> but wrap it in '[]' to make it valid json. That would allow folks with a
> pull-parser to write a real streaming reader, and folks without to "cheat"
> (ditch the leading and trailing [], and read the rest as
> one-record-per-line) until such a time as they can start using a more
> full-featured json parser.
>
> 1.
> http://en.wikipedia.org/wiki/User:Baxter.brad/Drafts/JSON_Document_Streaming_Proposal
> 2. http://lloyd.github.com/yajl/
> 3. https://github.com/sfalvo/php-yajl
> 4. https://github.com/brianmario/yajl-ruby/pull/50
> 5. http://dgraham.github.com/json-stream/
> 6. http://wiki.fasterxml.com/JacksonHome
>
>
>
> On Thu, Dec 1, 2011 at 12:56 PM, Michael B. Klein<[log in to unmask]> wrote:
>
>> +1 to marc-in-json
>> +1 to newline-delimited records
>> +1 to read support
>> +1 to edsu, rsinger, BillDueber, gmcharlt, and the other module maintainers
>>
>> On Thu, Dec 1, 2011 at 9:31 AM, Keith Jenkins<[log in to unmask]> wrote:
>>
>>> On Thu, Dec 1, 2011 at 11:56 AM, Gabriel Farrell<[log in to unmask]>
>>> wrote:> I suspect newline-delimited will win this race.
>>> Yes. Everyone please cast a vote for newline-delimited JSON.
>>>
>>> Is there any consensus on the appropriate mime type for ndj?
>>>
>>> Keith
>>>
>
>
|