Print

Print


Re: But do others agree that there is in fact no legal way to have Marc8 in MarcXML?

No -- it is perfectly legal - -but you MUST declare the encoding to BE Marc8 in the XML prolog, and you will want to be aware that XML processors are only REQUIRED to process UTF-8 and UTF-16 -- in practice many (including JAVA-based one) can handle other encodings -- but you will have to make sure whatever XML processor you use, in whatever language it is written, has a handy-dandy MARC8 coder/decoder ring

Sheila

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Jonathan Rochkind
Sent: Tuesday, April 17, 2012 2:46 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] MarcXML and char encodings

Thanks, this is helpful feedback at least.

I think it's completely irrelevant, when determining what is legal under 
standards, to talk about what certain Java tools happen to do though, I 
don't care too much what some tool you happen to use does.

In this case, I'm _writing_ the tools. I want to make them do 'the right 
thing', with some mix of what's actually official legally correct and 
what's practically useful.  What your Java tools do is more or less 
irrelevant to me. I certainly _could_ make my tool respect the Marc 
leader encoded in MarcXML over the XML decleration if I wanted to. I 
could even make it assume the data is Marc8 in XML, even though there's 
no XML charset type for it, if the leader says it's Marc8.

But do others agree that there is in fact no legal way to have Marc8 in 
MarcXML?

Do others agree that you can use non-UTF8 encodings in MarcXML, so long 
as they are legal XML?

I won't even ask someone to cite standards documents, because it's 
pretty clear that LC forgot to consider this when establishing MarcXML.  
(And I have no faith that one could get LC to make a call on this and 
publish it any time this century).

Has anyone seen any Marc8-encoded MarcXML in the wild? Is it common? How 
is it represented with regard to the XML leader and the Marc header?

Has anyone seen any MarcXML with char encodings that are neither Marc8 
nor UTF8 in the wild? Are they common? How are they represented with 
regard to XML leader and Marc header?

On 4/17/2012 2:32 PM, LeVan,Ralph wrote:
>> If I want to have a MarcXML document encoded in Marc8 -- what should
> it
>> look like?  What should be in the XML decleration? What should be in
> the
>> MARC header embedded in the XML?  Or is it not in fact legal at all?
> I'm going out on a limb here, but I don't think it is legal.  There is
> no formal encoding that corresponds to MARC-8, so there's no way to tell
> XML tools how to interpret the bytes.
>
>
>> If I want to have a MarcXML document encoded in UTF8, what should it
>> look like? What should be in the XML decleration? What should be in
> the
>> MARC header embedded in the XML?
> <?xml encoding="UTF-8"?>
>
> I suppose you'll want to set the leader to UTF-8 as well, but it doesn't
> really matter to any XML tools.
>
>
>> If I want to have a MarcXML document with a char encoding that is
>> _neither_ Marc8 nor UTF8, but something else generally legal for XML