Print

Print


Hi Tod,

I'm not understanding how UTF-8 would be considered 8-bit character data (other than the ASCII-range of the Unicode repertoire, natch).  I don't think ISO 2709 knows from characters, only bytes.

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [log in to unmask]
# http://rocky.uta.edu/doran/


> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Tod Olson
> Sent: Wednesday, April 18, 2012 5:04 AM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
> ISO_2709 and MARC21
> 
> It has to mean UTF-8. ISO 2709 is very byte-oriented, from the directory
> structure to the byte-offsets in the fixed fields. The values in these
> places all assume 8-bit character data, it's completely baked in to the
> file format.
> 
> -Tod
> 
> On Apr 17, 2012, at 6:55 PM, Jonathan Rochkind wrote:
> 
> > Okay, forget XML for a moment, let's just look at marc 'binary'.
> >
> > First, for Anglophone-centric MARC21.
> >
> > The LC docs don't actually say quite what I thought about leader byte
> 09, used to advertise encoding:
> >
> >
> > a - UCS/Unicode
> > Character coding in the record makes use of characters from the
> Universal Coded Character Set (UCS) (ISO 10646), or Unicode(tm), an industry
> subset.
> >
> >
> >
> > That doesn't say UTF-8. It says UCS or "Unicode". What does that
> actually mean?  Does it mean UTF-8, or does it mean UTF-16 (closer to
> what used to be called "UCS" I think?).  Whatever it actually means, do
> people violate it in the wild?
> >
> >
> >
> > Now we get to non-Anglophone centric marc. I think all of which is
> ISO_2709?  A standard which of course is not open access, so I can't get
> it to see what it says.
> >
> > But leader 09 being used for encoding -- is that Marc21 specific, or is
> it true of any ISO-2709?  Marc8 and "unicode" being the only valid
> encodings can't be true of any ISO-2709, right?
> >
> > Is there a generic ISO-2709 way to deal with this, or not so much?