Print

Print


> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Bill Dueber
> Sent: Saturday, March 06, 2010 05:11 PM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] Q: XML2JSON converter
> 
> Anyway, hopefully, it won't be a huge surprise that I don't disagree
> with any of the quote above in general; I would assert, though, that
> application/json and application/marc+json should both return JSON
> (in the same way that text/xml, application/xml, and 
> application/marc+xml can all be expected to return XML). 
> Newline-delimited json is starting to crop up in a few places 
> (e.g. couchdb) and should probably have its own mime type
> and associated extension. So I would say something like:
> 
> application/json -- return json (obviously)
> application/marc+json  -- return json
> application/marc+ndj  -- return newline-delimited json

This sounds like consensus on how to deal with newline-delimited JSON in a standards based manner.

I'm not familiar with CouchDB, but I am using MongoDB which is similar.  I'll have to dig into how they deal with this newline-delimited JSON.  Can you provide any references to get me started?

> In all cases, we should agree on a standard record serialization,
> though, and the pure-json returns should include something that 
> indicates what the heck it is (hopefully a URI that can act as a 
> distinct "namespace"-type identifier, including a version in it).

I agree that our MARC-JSON serialization needs some "namespace" identifier in it and it occurred to me that the way it is handling indicators, e.g., ind1 and ind2 properties, might be better handled as an array to accommodate IFLA's MARC-XML-ish where they can have from 1-9 indicator values.

BTW, our MARC-JSON content is specified in Unicode not MARC-8, per the JSON standard, which means you need to use \uXXXX notation to specify characters in strings, not sure I made that clear in earlier posts.  A downside to the current ECMA 262 specification is that it doesn't support \U00XXXXXX, as Python does, for the extended characters.  Hopefully that will get rectified in a future ECMA 262 specification.

> The question for me, I think, is whether within this community,  anyone
> who provides one of these types (application/marc+json and
> application/marc+ndj) should automatically be expected to provide both.
> I don't have an answer for that.

I think this issue gets into familiar territory when dealing with RDF formats.  Let's see, there is N3, NT, XML, Turtle, etc.  Do you need to provide all of them?  No, but it's nice of the server to at least provide NT or Turtle and XML.  Ultimately it's up to the server.  But the only difference between use cases #2 and #3 is whether the output is wrapped in an array, so it's probably easy for the server to produce both.

Depending on how much time I get next week I'll talk with the developer network folks to see what I need to do to put a specification under their infrastructure.  Looks like from my schedule it's going to be another week of hell :(


Andy.