On 4/17/2012 1:57 PM, Kyle Banerjee wrote:
> In some cases, invalid XML. In an ideal world, the encoding should be
> included in the declaration. But I wouldn't trust it. kyle
So would you use the Marc header payload instead?
Or you're just saying you wouldn't trust _any_ encoding declerations you
find anywhere?
When writing a library to handle marc, I think the base line should be
making it do the official legal standards-complaint right thing. Extra
heuristics to deal with invalid data can be added on top.
But my trouble here is I can't even figure out what the official legal
standards-compliant thing is.
Maybe that's becuase the MarcXML standard simply doesn't address it, and
it's all implementation dependent. sigh.
The problem is how the XML documents own char encoding is supposed to
interact with the MARC header; especially because there's no way to put
Marc8 in an XML char encoding doctype (is there?); and whether
encodings other than Marc8 or UTF8 are legal in MarcXML, even though
they aren't in MARC ISO binary.
I think the answer might be "nobody knows, and there is no standard
right way to do it." Which is unfortunate.
|