I know how char encodings work in MARC ISO binary -- the encoding can
legally be either Marc8 or UTF8 (nothing else). The encoding of a
record is specified in it's header. In the wild, specified encodings are
frequently wrong, or data includes weird mixed encodings. Okay!
But what's going on with MarcXML? What are the legal encodings for
MarcXML? Only Marc8 and UTF8, or anything that can be expressed in
XML? The MARC header is (or can) be present in MarcXML -- trust the
MARC header, or trust the XML doctype char encoding?
What's the legal thing to do? What's actually found 'in the wild' with
MarcXML?
Can anyone advise?
Jonathan
|