Print

Print


So what if the <?xml?> decleration says one charset encoding, but the 
MARC header included in the MarcXML says a different encoding... which 
one is the 'legal' one to believe?

Is it legal to have MarcXML that is not UTF-8 _or_ Marc8, that is an 
entirely different charset that is legal in XML?  If you did that, what 
should the MARC header included in the XML say?

I know how char encodings work in XML.  I don't understand what the 
standards say about how that interacts with the MARC data in MarcXML.

Jonathan

On 4/17/2012 1:51 PM, LeVan,Ralph wrote:
> There are probably a couple of answers to that.
>
> XML rules define what characterset is used. The "encoding" attribute on
> the<?xml?>  header is where you find out what characterset is being
> used.
>
> I've always gone under the assumption that if an encoding wasn't
> specified, then UTF-8 is in effect and that has always worked for me.
> It turns out the standard says US-ASCII is the default encoding.
>
> But, ignoring the encoding, the original MarcXML rules were the same as
> the MARC-21 rules for character repertoire and you were suppose to
> restrict yourself to characters that could be mapped back into MARC-8.
> I don't know if that rule is still in force, but everyone ignores it.
>
> I hope that helps!
>
> Ralph
>
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Jonathan Rochkind
> Sent: Tuesday, April 17, 2012 12:35 PM
> To: [log in to unmask]
> Subject: MarcXML and char encodings
>
> I know how char encodings work in MARC ISO binary -- the encoding can
> legally be either Marc8 or UTF8 (nothing else).  The encoding of a
> record is specified in it's header. In the wild, specified encodings are
>
> frequently wrong, or data includes weird mixed encodings. Okay!
>
> But what's going on with MarcXML?  What are the legal encodings for
> MarcXML?  Only Marc8 and UTF8, or anything that can be expressed in
> XML?  The MARC header is (or can) be present in MarcXML -- trust the
> MARC header, or trust the XML doctype char encoding?
>
> What's the legal thing  to do? What's actually found 'in the wild' with
> MarcXML?
>
> Can anyone advise?
>
> Jonathan
>