At the time of creation, characters and bytes were 1-to-1 because MARC used only ASCII. So there was no distinction at the outset. Some positions are still limited to ascii characters (Leader, fixed fields, subfield codes, etc.). kc On 4/18/12 7:20 AM, Huwig,Steve wrote: > I could be mistaken (never having had the pleasure of reading it), but > isn't ISO-2709 specified as a fixed number of characters, and any > conflation of characters and 8-bit bytes is on the part of users and > implementations? > > I think ISO 2709 might not know from bytes, only characters. > >> -----Original Message----- >> From: Code for Libraries [mailto:[log in to unmask]] On Behalf > Of >> Doran, Michael D >> Sent: Wednesday, April 18, 2012 10:05 AM >> To: [log in to unmask] >> Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about >> ISO_2709 and MARC21 >> >> Hi Tod, >> >> I'm not understanding how UTF-8 would be considered 8-bit character >> data (other than the ASCII-range of the Unicode repertoire, natch). I >> don't think ISO 2709 knows from characters, only bytes. >> >> -- Michael >> >> # Michael Doran, Systems Librarian >> # University of Texas at Arlington >> # 817-272-5326 office >> # 817-688-1926 mobile >> # [log in to unmask] >> # http://rocky.uta.edu/doran/ >> >> >>> -----Original Message----- >>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf >> Of >>> Tod Olson >>> Sent: Wednesday, April 18, 2012 5:04 AM >>> To: [log in to unmask] >>> Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about >>> ISO_2709 and MARC21 >>> >>> It has to mean UTF-8. ISO 2709 is very byte-oriented, from the >> directory >>> structure to the byte-offsets in the fixed fields. The values in >> these >>> places all assume 8-bit character data, it's completely baked in to >> the >>> file format. >>> >>> -Tod >>> >>> On Apr 17, 2012, at 6:55 PM, Jonathan Rochkind wrote: >>> >>>> Okay, forget XML for a moment, let's just look at marc 'binary'. >>>> >>>> First, for Anglophone-centric MARC21. >>>> >>>> The LC docs don't actually say quite what I thought about leader >> byte >>> 09, used to advertise encoding: >>>> >>>> >>>> a - UCS/Unicode >>>> Character coding in the record makes use of characters from the >>> Universal Coded Character Set (UCS) (ISO 10646), or Unicode(tm), an >> industry >>> subset. >>>> >>>> >>>> >>>> That doesn't say UTF-8. It says UCS or "Unicode". What does that >>> actually mean? Does it mean UTF-8, or does it mean UTF-16 (closer > to >>> what used to be called "UCS" I think?). Whatever it actually means, >> do >>> people violate it in the wild? >>>> >>>> >>>> >>>> Now we get to non-Anglophone centric marc. I think all of which is >>> ISO_2709? A standard which of course is not open access, so I can't >> get >>> it to see what it says. >>>> >>>> But leader 09 being used for encoding -- is that Marc21 specific, >> or is >>> it true of any ISO-2709? Marc8 and "unicode" being the only valid >>> encodings can't be true of any ISO-2709, right? >>>> >>>> Is there a generic ISO-2709 way to deal with this, or not so much? -- Karen Coyle [log in to unmask] http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet