Print

Print


> ISO 2709 doesn't care how many bytes your characters are. The directory
> and offsets and other things count bytes, not characters.

That was exactly my point.  (Which I am stating since you quoted me and I couldn't tell if you were refuting my point, or using it to support your conclusion.)  ;-)

-- Michael

> -----Original Message-----
> From: Jonathan Rochkind [mailto:[log in to unmask]]
> Sent: Wednesday, April 18, 2012 11:09 AM
> To: Code for Libraries
> Cc: Doran, Michael D
> Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
> ISO_2709 and MARC21
> 
> On 4/18/2012 11:09 AM, Doran, Michael D wrote:
> > I don't believe that is the case.  Take UTF-8 out of the picture, and
> consider the MARC-8 character set with its escape sequences and combining
> characters.  A character such as an "n" with a tilde would consist of two
> bytes.  The Greek small letter alpha, if invoked in accordance with ANSI
> X3.41, would consist of five bytes (two bytes for the initial escape
> sequence, a byte for the character, and then two bytes for the escape
> sequence returning to the default character set).
> 
> ISO 2709 doesn't care how many bytes your characters are. The directory
> and offsets and other things count bytes, not characters. (which was, in
> my opinion, the _right_ decision, for once with marc!)
> 
> How bytes translate into characters is not a concern of ISO 2709.
> 
> The majority of non-7-bit-ASCII encodings will have chars that are more
> than one byte, either sometimes or always. This is true of MARC8 (some
> chars), UTF8 (some chars), and UTF16 (all chars), all of them. (It is
> not true of Latin-1 though, for instance, I don't think).
> 
> ISO 2709 doesn't care what char encodings you use, and there's no
> standard ISO 2709 way to determine what char encodings are used for
> _data_ in the MARC record. ISO 2709 does say that _structural_ elements
> like field names, subfield names, the directory itself, seperator chars,
> etc, all need to be (essentially, over-simplifying) 7-bit-ASCII. The
> actual data itself is application dependent, 2709 doesn't care, and 2709
> doesn't give any standard cross-2709 way to determine it.
> 
> That is my conclusion at the moment, helped by all of you all in this
> thread, thanks!