Print

Print


> I could be mistaken (never having had the pleasure of reading it), but
> isn't ISO-2709 specified as a fixed number of characters, and any
> conflation of characters and 8-bit bytes is on the part of users and
> implementations?

I don't believe that is the case.  Take UTF-8 out of the picture, and consider the MARC-8 character set with its escape sequences and combining characters.  A character such as an "n" with a tilde would consist of two bytes.  The Greek small letter alpha, if invoked in accordance with ANSI X3.41, would consist of five bytes (two bytes for the initial escape sequence, a byte for the character, and then two bytes for the escape sequence returning to the default character set).

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [log in to unmask]
# http://rocky.uta.edu/doran/

> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Huwig,Steve
> Sent: Wednesday, April 18, 2012 9:21 AM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
> ISO_2709 and MARC21
> 
> I could be mistaken (never having had the pleasure of reading it), but
> isn't ISO-2709 specified as a fixed number of characters, and any
> conflation of characters and 8-bit bytes is on the part of users and
> implementations?
> 
> I think ISO 2709 might not know from bytes, only characters.
> 
> > -----Original Message-----
> > From: Code for Libraries [mailto:[log in to unmask]] On Behalf
> Of
> > Doran, Michael D
> > Sent: Wednesday, April 18, 2012 10:05 AM
> > To: [log in to unmask]
> > Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
> > ISO_2709 and MARC21
> >
> > Hi Tod,
> >
> > I'm not understanding how UTF-8 would be considered 8-bit character
> > data (other than the ASCII-range of the Unicode repertoire, natch).  I
> > don't think ISO 2709 knows from characters, only bytes.
> >
> > -- Michael
> >
> > # Michael Doran, Systems Librarian
> > # University of Texas at Arlington
> > # 817-272-5326 office
> > # 817-688-1926 mobile
> > # [log in to unmask]
> > # http://rocky.uta.edu/doran/
> >
> >
> > > -----Original Message-----
> > > From: Code for Libraries [mailto:[log in to unmask]] On Behalf
> > Of
> > > Tod Olson
> > > Sent: Wednesday, April 18, 2012 5:04 AM
> > > To: [log in to unmask]
> > > Subject: Re: [CODE4LIB] more on MARC char encoding: Now we're about
> > > ISO_2709 and MARC21
> > >
> > > It has to mean UTF-8. ISO 2709 is very byte-oriented, from the
> > directory
> > > structure to the byte-offsets in the fixed fields. The values in
> > these
> > > places all assume 8-bit character data, it's completely baked in to
> > the
> > > file format.
> > >
> > > -Tod
> > >
> > > On Apr 17, 2012, at 6:55 PM, Jonathan Rochkind wrote:
> > >
> > > > Okay, forget XML for a moment, let's just look at marc 'binary'.
> > > >
> > > > First, for Anglophone-centric MARC21.
> > > >
> > > > The LC docs don't actually say quite what I thought about leader
> > byte
> > > 09, used to advertise encoding:
> > > >
> > > >
> > > > a - UCS/Unicode
> > > > Character coding in the record makes use of characters from the
> > > Universal Coded Character Set (UCS) (ISO 10646), or Unicode(tm), an
> > industry
> > > subset.
> > > >
> > > >
> > > >
> > > > That doesn't say UTF-8. It says UCS or "Unicode". What does that
> > > actually mean?  Does it mean UTF-8, or does it mean UTF-16 (closer
> to
> > > what used to be called "UCS" I think?).  Whatever it actually means,
> > do
> > > people violate it in the wild?
> > > >
> > > >
> > > >
> > > > Now we get to non-Anglophone centric marc. I think all of which is
> > > ISO_2709?  A standard which of course is not open access, so I can't
> > get
> > > it to see what it says.
> > > >
> > > > But leader 09 being used for encoding -- is that Marc21 specific,
> > or is
> > > it true of any ISO-2709?  Marc8 and "unicode" being the only valid
> > > encodings can't be true of any ISO-2709, right?
> > > >
> > > > Is there a generic ISO-2709 way to deal with this, or not so much?