> ISO 2709 doesn't care how many bytes your characters are. The directory > and offsets and other things count bytes, not characters. That was exactly my point. (Which I am stating since you quoted me and I couldn't tell if you were refuting my point, or using it to support your conclusion.) ;-) -- Michael > -----Original Message----- > From: Jonathan Rochkind [mailto:[log in to unmask]] > Sent: Wednesday, April 18, 2012 11:09 AM > To: Code for Libraries > Cc: Doran, Michael D > Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about > ISO_2709 and MARC21 > > On 4/18/2012 11:09 AM, Doran, Michael D wrote: > > I don't believe that is the case. Take UTF-8 out of the picture, and > consider the MARC-8 character set with its escape sequences and combining > characters. A character such as an "n" with a tilde would consist of two > bytes. The Greek small letter alpha, if invoked in accordance with ANSI > X3.41, would consist of five bytes (two bytes for the initial escape > sequence, a byte for the character, and then two bytes for the escape > sequence returning to the default character set). > > ISO 2709 doesn't care how many bytes your characters are. The directory > and offsets and other things count bytes, not characters. (which was, in > my opinion, the _right_ decision, for once with marc!) > > How bytes translate into characters is not a concern of ISO 2709. > > The majority of non-7-bit-ASCII encodings will have chars that are more > than one byte, either sometimes or always. This is true of MARC8 (some > chars), UTF8 (some chars), and UTF16 (all chars), all of them. (It is > not true of Latin-1 though, for instance, I don't think). > > ISO 2709 doesn't care what char encodings you use, and there's no > standard ISO 2709 way to determine what char encodings are used for > _data_ in the MARC record. ISO 2709 does say that _structural_ elements > like field names, subfield names, the directory itself, seperator chars, > etc, all need to be (essentially, over-simplifying) 7-bit-ASCII. The > actual data itself is application dependent, 2709 doesn't care, and 2709 > doesn't give any standard cross-2709 way to determine it. > > That is my conclusion at the moment, helped by all of you all in this > thread, thanks!