Print

Print


Hi Ralph,

> But, ignoring the encoding, the original MarcXML rules were the same as
> the MARC-21 rules for character repertoire and you were suppose to
> restrict yourself to characters that could be mapped back into MARC-8.
> I don't know if that rule is still in force, but everyone ignores it.

That rule no longer applies per the December 2007 revision of the MARC 21 Specifications:

	"To facilitate the movement of records between MARC-8 
	and Unicode environments, it was recommended for an 
	initial period that the use of Unicode be restricted 
	to a repertoire identical in extent to the MARC-8 
	repertoire. [...] however, such a restriction is no 
	longer appropriate. The full UCS repertoire, as currently 
	defined at the Unicode web site, is valid for encoding 
	MARC 21 records subject only to the constraints described 
	[in the current MARC 21 Specifications]."
	
	-- from MARC 21 Specifications (revised December 2007) [1]

-- Michael

[1] http://www.loc.gov/marc/specifications/speccharucs.html

> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> LeVan,Ralph
> Sent: Tuesday, April 17, 2012 12:51 PM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] MarcXML and char encodings
> 
> There are probably a couple of answers to that.
> 
> XML rules define what characterset is used. The "encoding" attribute on
> the <?xml?> header is where you find out what characterset is being
> used.
> 
> I've always gone under the assumption that if an encoding wasn't
> specified, then UTF-8 is in effect and that has always worked for me.
> It turns out the standard says US-ASCII is the default encoding.
> 
> But, ignoring the encoding, the original MarcXML rules were the same as
> the MARC-21 rules for character repertoire and you were suppose to
> restrict yourself to characters that could be mapped back into MARC-8.
> I don't know if that rule is still in force, but everyone ignores it.
> 
> I hope that helps!
> 
> Ralph
> 
> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Jonathan Rochkind
> Sent: Tuesday, April 17, 2012 12:35 PM
> To: [log in to unmask]
> Subject: MarcXML and char encodings
> 
> I know how char encodings work in MARC ISO binary -- the encoding can
> legally be either Marc8 or UTF8 (nothing else).  The encoding of a
> record is specified in it's header. In the wild, specified encodings are
> 
> frequently wrong, or data includes weird mixed encodings. Okay!
> 
> But what's going on with MarcXML?  What are the legal encodings for
> MarcXML?  Only Marc8 and UTF8, or anything that can be expressed in
> XML?  The MARC header is (or can) be present in MarcXML -- trust the
> MARC header, or trust the XML doctype char encoding?
> 
> What's the legal thing  to do? What's actually found 'in the wild' with
> MarcXML?
> 
> Can anyone advise?
> 
> Jonathan