So what if the <?xml?> decleration says one charset encoding, but the
MARC header included in the MarcXML says a different encoding... which
one is the 'legal' one to believe?
Is it legal to have MarcXML that is not UTF-8 _or_ Marc8, that is an
entirely different charset that is legal in XML? If you did that, what
should the MARC header included in the XML say?
I know how char encodings work in XML. I don't understand what the
standards say about how that interacts with the MARC data in MarcXML.
Jonathan
On 4/17/2012 1:51 PM, LeVan,Ralph wrote:
> There are probably a couple of answers to that.
>
> XML rules define what characterset is used. The "encoding" attribute on
> the<?xml?> header is where you find out what characterset is being
> used.
>
> I've always gone under the assumption that if an encoding wasn't
> specified, then UTF-8 is in effect and that has always worked for me.
> It turns out the standard says US-ASCII is the default encoding.
>
> But, ignoring the encoding, the original MarcXML rules were the same as
> the MARC-21 rules for character repertoire and you were suppose to
> restrict yourself to characters that could be mapped back into MARC-8.
> I don't know if that rule is still in force, but everyone ignores it.
>
> I hope that helps!
>
> Ralph
>
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Jonathan Rochkind
> Sent: Tuesday, April 17, 2012 12:35 PM
> To: [log in to unmask]
> Subject: MarcXML and char encodings
>
> I know how char encodings work in MARC ISO binary -- the encoding can
> legally be either Marc8 or UTF8 (nothing else). The encoding of a
> record is specified in it's header. In the wild, specified encodings are
>
> frequently wrong, or data includes weird mixed encodings. Okay!
>
> But what's going on with MarcXML? What are the legal encodings for
> MarcXML? Only Marc8 and UTF8, or anything that can be expressed in
> XML? The MARC header is (or can) be present in MarcXML -- trust the
> MARC header, or trust the XML doctype char encoding?
>
> What's the legal thing to do? What's actually found 'in the wild' with
> MarcXML?
>
> Can anyone advise?
>
> Jonathan
>
|