Print

Print


Okay, forget XML for a moment, let's just look at marc 'binary'.

First, for Anglophone-centric MARC21.

The LC docs don't actually say quite what I thought about leader byte 
09, used to advertise encoding:


a - UCS/Unicode
Character coding in the record makes use of characters from the 
Universal Coded Character Set (UCS) (ISO 10646), or Unicodeā„¢, an 
industry subset.



That doesn't say UTF-8. It says UCS or "Unicode". What does that 
actually mean?  Does it mean UTF-8, or does it mean UTF-16 (closer to 
what used to be called "UCS" I think?).  Whatever it actually means, do 
people violate it in the wild?



Now we get to non-Anglophone centric marc. I think all of which is 
ISO_2709?  A standard which of course is not open access, so I can't get 
it to see what it says.

But leader 09 being used for encoding -- is that Marc21 specific, or is 
it true of any ISO-2709?  Marc8 and "unicode" being the only valid 
encodings can't be true of any ISO-2709, right?

Is there a generic ISO-2709 way to deal with this, or not so much?