On Fri, Mar 5, 2010 at 4:38 PM, Houghton,Andrew <[log in to unmask]> wrote:
>
> Maybe I have been mislead or misunderstood JSON streaming.
This is my central point. I'm actually saying that JSON streaming is painful
and rare enough that it should be avoided as a requirement for working with
any new format.
I guess, in sum, I'm making the following assertions:
1. Streaming APIs for JSON, where they exist, are a pain in the ass. And
they don't exist everywhere. Without a JSON streaming parser, you have to
pull the whole array of documents up into memory, which may be impossible.
This is the crux of my argument -- if you disagree with it, then I would
assume you disagree with the other points as well.
2. Many people -- and I don't think I'm exaggerating here, honestly --
really don't like using MARC-XML but have to because of the length
restrictions on MARC-binary. A useful alternative, based on dead-easy
parsing and production, is very appealing.
2.5 Having to deal with a streaming API takes away the "dead-easy" part.
3. If you accept my assertions about streaming parsers, then dealing with
the format you've proposed for large sets is either painful (with a
streaming API) or impossible (where such an API doesn't exist) due to memory
constraints.
4. Streaming JSON writer APIs are also painful; everything that applies to
reading applies to writing. Sans a streaming writer, trying to *write* a
large JSON document also results in you having to have the whole thing in
memory.
5. People are going to want to deal with this format, because of its
benefits over marc21 (record length) and marc-xml (ease of processing),
which means we're going to want to deal with big sets of data and/or dump
batches of it to a file. Which brings us back to #1, the pain or absence of
streaming apis.
"Write a better JSON parser/writer" or "use a different language" seem like
bad solutions to me, especially when a (potentially) useful alternative
exists.
As I pointed out, if streaming JSON is no harder/unavailable to you than
non-streaming json, then this is mostly moot. I assert that for many people
in this community it is one or the other, which is why I'm leery of it.
-Bill-
--
Bill Dueber
Library Systems Programmer
University of Michigan Library
|