Print

Print


I hate Groupwise for forcing me to top-post.

Yes, you are right about everything. Limiting MARC-HASH to just UTF8, rather than supporting the full range of encodings allowed by JSON, probably makes it easier to generate and parse; it will bloat the size of the format for characters outside of the Basic Multilingual Plane but probably nobody cares, bandwidth is cheap, right? And this is primarily meant as a transmission format.

I missed the part in the blog entry about the newline-delimited JSON because I was specifically looking for a mention of "collections". newline-delimited JSON would work, yes, and probably be easier / faster / less memory-intensive to parse.

Dan

>>> Jonathan Rochkind <[log in to unmask]> 03/18/10 10:41 AM >>>
So do you think the marc-hash-to-json "proto-spec" should suggest that 
the encoding HAS to be UTF-8, or should it leave it open to anything 
that's legal JSON?   (Is there a problem I don't know about with 
expressing "characters outside of the Basic Multilingual Plane" in 
UTF-8?  Any unicode char can be encoded in any of the unicode encodings, 
right?). 

If "collections" means what I think, Bill's blog proto-spec says they 
should be serialized as JSON-seperated-by-newlines, right?  That is, 
JSON for each record, seperated by newlines. Rather than the alternative 
approach you hypothesize there; there are various reasons to prefer 
json-seperated-by-newlines, which is an actual convention used in the 
wild, not something made up just for here.

Jonathan

Dan Scott wrote:
> Hey Bill:
>
> Do you have unit tests for MARC-HASH / JSON anywhere? If you do, that would make it easier for me to create a compliant PHP File_MARC_JSON variant, which I'll be happy-ish to create.
>
> The only concerns I have with your write-up are:
>   * JSON itself allows UTF8, UTF16, and UTF32 encoding - and we've seen in Evergreen some cases where characters outside of the Basic Multilingual Plane are required. We eventually wound up resorting to surrogate pairs, in that case; so maybe this isn't a real issue.
>   * You've mentioned that you would like to see better support for collections in File_MARC / File_MARCXML; but I don't see any mention of how collections would work in MARC-HASH / JSON. Would it just be something like the following?
>
> "collection": [
>   {
>     "type" : "marc-hash"
>     "version" : [1, 0]
>     "leader" : "…leader string … "
>     "fields" : [array, of, fields]
>   },
>   {
>     "type" : "marc-hash"
>     "version" : [1, 0]
>     "leader" : "…leader string … "
>     "fields" : [array, of, fields]
>   }
> ]
>
> Dan
>
>   
>>>> Bill Dueber <[log in to unmask]> 03/15/10 12:22 PM >>>
>>>>         
> I'm pretty sure Andrew was (a) completely unaware of anything I'd done, and
> (b) looking to match marc-xml as strictly as reasonable.
>
> I also like the array-based rather than hash-based format, but I'm not gonna
> go to the mat for it if no one else cares much.
>
> I would like to see ind1 and ind2 get their own fields, though, for easier
> use of stuff like jsonpath in json-centric nosql databases.
>
> On Mon, Mar 15, 2010 at 10:52 AM, Jonathan Rochkind <[log in to unmask]>wrote:
>
>   
>> I would just ask why you didn't use Bill Dueber's already existing
>> proto-spec, instead of making up your own incomptable one.
>>
>> I'd think we could somehow all do the same consistent thing here.
>>
>> Since my interest in marc-json is getting as small a package as possible
>> for transfer accross the wire, I prefer Bill's approach.
>>
>> http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/
>>
>>
>> Houghton,Andrew wrote:
>>
>>     
>>> From: Houghton,Andrew
>>>       
>>>> Sent: Saturday, March 06, 2010 06:59 PM
>>>> To: Code for Libraries
>>>> Subject: RE: [CODE4LIB] Q: XML2JSON converter
>>>>
>>>> Depending on how much time I get next week I'll talk with the developer
>>>> network folks to see what I need to do to put a specification under
>>>> their infrastructure
>>>>
>>>>
>>>>         
>>> I finished documenting our existing use of MARC-JSON.  The specification
>>> can be found on the OCLC developer network wiki [1].  Since it is a wiki,
>>> registered developer network members can edit the specification and I would
>>> ask that you refrain from doing so.
>>>
>>> However, please do use the discussion tab to record issues with the
>>> specification or add additional information to existing issues.  There are
>>> already two open issues on the discussion tab and you can use them as a
>>> template for new issues.  The first issue is Bill Dueber's request for some
>>> sort of versioning and the second issue is whether the specification should
>>> specify the flavor of MARC, e.g., marc21, unicode, etc.
>>>
>>> It is recommended that you place issues on the discussion tab since that
>>> will be the official place for documenting and disposing of them.  I do
>>> monitor this listserve and the OCLC developer network listserve, but I only
>>> selectively look at messages on those listserves.  If you would like to use
>>> this listserve or the OCLC developer network listserve to discuss the
>>> MARC-JSON specification, make sure you place MARC-JSON in the subject line,
>>> to give me a clue that I *should* look at that message, or directly CC my
>>> e-mail address on your post.
>>>
>>> This message marks the beginning of a two week comment period on the
>>> specification which will end on midnight 2010-03-28.
>>>
>>> [1] <http://worldcat.org/devnet/wiki/MARC-JSON_Draft_2010-03-11>
>>>
>>>
>>> Thanks, Andy.
>>>
>>>
>>>       
>
>
>