The discussions at the MARC standards group relating to Unicode all had
to do with using Unicode *within* ISO2709. I can't find any evidence
that MARCXML ever went through the standards process. (This may not be a
bad thing.) So none of what we know about the MARBI discussions and
resulting standards can really help us here, except perhaps by analogy.
In LC's own example on the MARCXML page (the Sandburg example) the
Leader is copied without change from the ISO2709/MARC-8 record to the
MARCXML/Unicode record -- in other words, it still has a blank in offset
09, which means "MARC-8". (The XML record is UTF-8.) My gut feeling is
that the Leader in MARCXML should be treated like the human appendix --
something that once had a use, but is now just being carried along for
historical reasons. I would not expect it to reflect the XML record
within which it is embedded. Unfortunately, it is the only source of
some key information, like type of record. The more I think about it,
the more MARCXML strikes me as a really messed-up format.
kc
On 4/17/12 11:46 AM, Jonathan Rochkind wrote:
> Thanks, this is helpful feedback at least.
>
> I think it's completely irrelevant, when determining what is legal under
> standards, to talk about what certain Java tools happen to do though, I
> don't care too much what some tool you happen to use does.
>
> In this case, I'm _writing_ the tools. I want to make them do 'the right
> thing', with some mix of what's actually official legally correct and
> what's practically useful. What your Java tools do is more or less
> irrelevant to me. I certainly _could_ make my tool respect the Marc
> leader encoded in MarcXML over the XML decleration if I wanted to. I
> could even make it assume the data is Marc8 in XML, even though there's
> no XML charset type for it, if the leader says it's Marc8.
>
> But do others agree that there is in fact no legal way to have Marc8 in
> MarcXML?
>
> Do others agree that you can use non-UTF8 encodings in MarcXML, so long
> as they are legal XML?
>
> I won't even ask someone to cite standards documents, because it's
> pretty clear that LC forgot to consider this when establishing MarcXML.
> (And I have no faith that one could get LC to make a call on this and
> publish it any time this century).
>
> Has anyone seen any Marc8-encoded MarcXML in the wild? Is it common? How
> is it represented with regard to the XML leader and the Marc header?
>
> Has anyone seen any MarcXML with char encodings that are neither Marc8
> nor UTF8 in the wild? Are they common? How are they represented with
> regard to XML leader and Marc header?
>
> On 4/17/2012 2:32 PM, LeVan,Ralph wrote:
>>> If I want to have a MarcXML document encoded in Marc8 -- what should
>> it
>>> look like? What should be in the XML decleration? What should be in
>> the
>>> MARC header embedded in the XML? Or is it not in fact legal at all?
>> I'm going out on a limb here, but I don't think it is legal. There is
>> no formal encoding that corresponds to MARC-8, so there's no way to tell
>> XML tools how to interpret the bytes.
>>
>>
>>> If I want to have a MarcXML document encoded in UTF8, what should it
>>> look like? What should be in the XML decleration? What should be in
>> the
>>> MARC header embedded in the XML?
>> <?xml encoding="UTF-8"?>
>>
>> I suppose you'll want to set the leader to UTF-8 as well, but it doesn't
>> really matter to any XML tools.
>>
>>
>>> If I want to have a MarcXML document with a char encoding that is
>>> _neither_ Marc8 nor UTF8, but something else generally legal for XML
--
Karen Coyle
[log in to unmask] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
|