Oh, I wasn't actually suggesting limiting to UTF-8 was the right way to
go, I was asking your opinion! It's not at all clear to me, but if your
opinion is that UTF-8 is indeed the right way to go, that's comforting. :)
Bandwidth _does_ matter I think, it's primarily intended as a
transmission format, and the reasons _I_ am interested in it as a
transmission format over MarcXML is in large part precisely because it
will be so much smaller a package, I'm running into various performance
problems caused by the very large package size of MarcXML. (Disk space
might be cheap, but bandwidth, over the network, or to the file system,
is not neccesarily, for me anyway.)
But I'm not sure I'm concerned about UTF-8 bloating size of response, I
think it will still be manageable and worth it to avoid confusion. I
pretty much do _everything_ in UTF-8 myself these days, because it's
just not worth the headache to me to do anything else. But I have MUCH
less experience dealing with international character sets than you,
which is why I was curious as to your opinion. There's no reason the
marc-hash-in-json proto-spec couldn't allow any valid JSON character
encoding, if you/we/someone thinks it's neccesary/more-convenient.
Jonathan
Dan Scott wrote:
> I hate Groupwise for forcing me to top-post.
>
> Yes, you are right about everything. Limiting MARC-HASH to just UTF8, rather than supporting the full range of encodings allowed by JSON, probably makes it easier to generate and parse; it will bloat the size of the format for characters outside of the Basic Multilingual Plane but probably nobody cares, bandwidth is cheap, right? And this is primarily meant as a transmission format.
>
> I missed the part in the blog entry about the newline-delimited JSON because I was specifically looking for a mention of "collections". newline-delimited JSON would work, yes, and probably be easier / faster / less memory-intensive to parse.
>
> Dan
>
>
>>>> Jonathan Rochkind <[log in to unmask]> 03/18/10 10:41 AM >>>
>>>>
> So do you think the marc-hash-to-json "proto-spec" should suggest that
> the encoding HAS to be UTF-8, or should it leave it open to anything
> that's legal JSON? (Is there a problem I don't know about with
> expressing "characters outside of the Basic Multilingual Plane" in
> UTF-8? Any unicode char can be encoded in any of the unicode encodings,
> right?).
>
> If "collections" means what I think, Bill's blog proto-spec says they
> should be serialized as JSON-seperated-by-newlines, right? That is,
> JSON for each record, seperated by newlines. Rather than the alternative
> approach you hypothesize there; there are various reasons to prefer
> json-seperated-by-newlines, which is an actual convention used in the
> wild, not something made up just for here.
>
> Jonathan
>
> Dan Scott wrote:
>
>> Hey Bill:
>>
>> Do you have unit tests for MARC-HASH / JSON anywhere? If you do, that would make it easier for me to create a compliant PHP File_MARC_JSON variant, which I'll be happy-ish to create.
>>
>> The only concerns I have with your write-up are:
>> * JSON itself allows UTF8, UTF16, and UTF32 encoding - and we've seen in Evergreen some cases where characters outside of the Basic Multilingual Plane are required. We eventually wound up resorting to surrogate pairs, in that case; so maybe this isn't a real issue.
>> * You've mentioned that you would like to see better support for collections in File_MARC / File_MARCXML; but I don't see any mention of how collections would work in MARC-HASH / JSON. Would it just be something like the following?
>>
>> "collection": [
>> {
>> "type" : "marc-hash"
>> "version" : [1, 0]
>> "leader" : "…leader string … "
>> "fields" : [array, of, fields]
>> },
>> {
>> "type" : "marc-hash"
>> "version" : [1, 0]
>> "leader" : "…leader string … "
>> "fields" : [array, of, fields]
>> }
>> ]
>>
>> Dan
>>
>>
>>
>>>>> Bill Dueber <[log in to unmask]> 03/15/10 12:22 PM >>>
>>>>>
>>>>>
>> I'm pretty sure Andrew was (a) completely unaware of anything I'd done, and
>> (b) looking to match marc-xml as strictly as reasonable.
>>
>> I also like the array-based rather than hash-based format, but I'm not gonna
>> go to the mat for it if no one else cares much.
>>
>> I would like to see ind1 and ind2 get their own fields, though, for easier
>> use of stuff like jsonpath in json-centric nosql databases.
>>
>> On Mon, Mar 15, 2010 at 10:52 AM, Jonathan Rochkind <[log in to unmask]>wrote:
>>
>>
>>
>>> I would just ask why you didn't use Bill Dueber's already existing
>>> proto-spec, instead of making up your own incomptable one.
>>>
>>> I'd think we could somehow all do the same consistent thing here.
>>>
>>> Since my interest in marc-json is getting as small a package as possible
>>> for transfer accross the wire, I prefer Bill's approach.
>>>
>>> http://robotlibrarian.billdueber.com/new-interest-in-marc-hash-json/
>>>
>>>
>>> Houghton,Andrew wrote:
>>>
>>>
>>>
>>>> From: Houghton,Andrew
>>>>
>>>>
>>>>> Sent: Saturday, March 06, 2010 06:59 PM
>>>>> To: Code for Libraries
>>>>> Subject: RE: [CODE4LIB] Q: XML2JSON converter
>>>>>
>>>>> Depending on how much time I get next week I'll talk with the developer
>>>>> network folks to see what I need to do to put a specification under
>>>>> their infrastructure
>>>>>
>>>>>
>>>>>
>>>>>
>>>> I finished documenting our existing use of MARC-JSON. The specification
>>>> can be found on the OCLC developer network wiki [1]. Since it is a wiki,
>>>> registered developer network members can edit the specification and I would
>>>> ask that you refrain from doing so.
>>>>
>>>> However, please do use the discussion tab to record issues with the
>>>> specification or add additional information to existing issues. There are
>>>> already two open issues on the discussion tab and you can use them as a
>>>> template for new issues. The first issue is Bill Dueber's request for some
>>>> sort of versioning and the second issue is whether the specification should
>>>> specify the flavor of MARC, e.g., marc21, unicode, etc.
>>>>
>>>> It is recommended that you place issues on the discussion tab since that
>>>> will be the official place for documenting and disposing of them. I do
>>>> monitor this listserve and the OCLC developer network listserve, but I only
>>>> selectively look at messages on those listserves. If you would like to use
>>>> this listserve or the OCLC developer network listserve to discuss the
>>>> MARC-JSON specification, make sure you place MARC-JSON in the subject line,
>>>> to give me a clue that I *should* look at that message, or directly CC my
>>>> e-mail address on your post.
>>>>
>>>> This message marks the beginning of a two week comment period on the
>>>> specification which will end on midnight 2010-03-28.
>>>>
>>>> [1] <http://worldcat.org/devnet/wiki/MARC-JSON_Draft_2010-03-11>
>>>>
>>>>
>>>> Thanks, Andy.
>>>>
>>>>
>>>>
>>>>
>>
>>
>
>
|