So what if the <?xml?> decleration says one charset encoding, but the MARC header included in the MarcXML says a different encoding... which one is the 'legal' one to believe? Is it legal to have MarcXML that is not UTF-8 _or_ Marc8, that is an entirely different charset that is legal in XML? If you did that, what should the MARC header included in the XML say? I know how char encodings work in XML. I don't understand what the standards say about how that interacts with the MARC data in MarcXML. Jonathan On 4/17/2012 1:51 PM, LeVan,Ralph wrote: > There are probably a couple of answers to that. > > XML rules define what characterset is used. The "encoding" attribute on > the<?xml?> header is where you find out what characterset is being > used. > > I've always gone under the assumption that if an encoding wasn't > specified, then UTF-8 is in effect and that has always worked for me. > It turns out the standard says US-ASCII is the default encoding. > > But, ignoring the encoding, the original MarcXML rules were the same as > the MARC-21 rules for character repertoire and you were suppose to > restrict yourself to characters that could be mapped back into MARC-8. > I don't know if that rule is still in force, but everyone ignores it. > > I hope that helps! > > Ralph > > -----Original Message----- > From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of > Jonathan Rochkind > Sent: Tuesday, April 17, 2012 12:35 PM > To: [log in to unmask] > Subject: MarcXML and char encodings > > I know how char encodings work in MARC ISO binary -- the encoding can > legally be either Marc8 or UTF8 (nothing else). The encoding of a > record is specified in it's header. In the wild, specified encodings are > > frequently wrong, or data includes weird mixed encodings. Okay! > > But what's going on with MarcXML? What are the legal encodings for > MarcXML? Only Marc8 and UTF8, or anything that can be expressed in > XML? The MARC header is (or can) be present in MarcXML -- trust the > MARC header, or trust the XML doctype char encoding? > > What's the legal thing to do? What's actually found 'in the wild' with > MarcXML? > > Can anyone advise? > > Jonathan >