LISTSERV 16.5 - CODE4LIB Archives

Hi Jonathan,

> I tried to figure out how to custom add a new encoding to ruby 1.9 with
> the idea of adding Marc8 as an actuall  ruby 1.9 character encoding
> supported same as any other built in char encoding

Not a trivial undertaking.  Remember that the MARC-8 environment allows alternate character sets to be invoked within a MARC record using two different "escape" methods [1].  Just one of the reasons why you're not finding a bunch of these MARC-8 conversion modules, and one for every language. ;-)

-- Michael

[1] Technique 1 is unique to MARC-8 and provides access to a small number of Greek symbols, subscripts, and superscripts. Technique 2 is based on the ANSI X3.41 (ISO 2022) "Code Extension Techniques for Use with 7-bit and 8-bit Character Sets" standard. See the MARC 21 Specification for details on accessing alternate graphic character sets (http://www.loc.gov/marc/specifications/speccharmarc8.html#alternative).
 

> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Jonathan Rochkind
> Sent: Monday, October 24, 2011 2:01 PM
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] marc-8
> 
> What _ought_ to be easiest of all is getting our ILS's to NEVER export
> Marc8 _ever_ again.  UTF8 only.
> 
> Sadly, that only ought to be easiest.
> 
> But IMO there's no reason any of us should be dealing with Marc8 ever
> again.  The only thing that should deal in Marc8 is an ILS, and should
> only input it, NEVER output it, UTF8 only, please!
> 
> But this is not the world we live in.
> 
> I tried to figure out how to custom add a new encoding to ruby 1.9 with
> the idea of adding Marc8 as an actuall  ruby 1.9 character encoding
> supported same as any other built in char encoding, but I couldn't
> figure out if that was possible or how to do it.  If it was possible to
> do at that low level in ruby 1.9, it might justify the time to do it.
> 
> On 10/24/2011 2:55 PM, Doran, Michael D wrote:
> > Eric,
> >
> > Sometimes for grandpa Perl stuff -- especially as concerns charsets and/or
> internationalization -- it's worth pinging these lists:
> >
> > 	[log in to unmask] (yes, still alive and kicking)
> >
> > 	[log in to unmask] (very low traffic list, but some knowledgeable
> subscribers)
> >
> > -- Michael
> >
> >> -----Original Message-----
> >> From: Doran, Michael D
> >> Sent: Monday, October 24, 2011 1:48 PM
> >> To: 'Code for Libraries'
> >> Subject: RE: [CODE4LIB] marc-8
> >>
> >>> Okay. How do I go about converting MARC-8 encoded records into UTF-8?
> >> In Perl... using the handy MARC::Charset module (tip 'o the hat to Ed
> >> Summers, and now maintained by Galen Charlton).
> >>
> >> -- Michael
> >>
> >>> -----Original Message-----
> >>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> >> Eric
> >>> Lease Morgan
> >>> Sent: Monday, October 24, 2011 1:39 PM
> >>> To: [log in to unmask]
> >>> Subject: Re: [CODE4LIB] marc-8
> >>>
> >>> On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote:
> >>>
> >>>>> In Perl, how do I specify MARC-8 when reading (decoding) and writing
> >>>>> (encoding) data?
> >>>> You can't.  MARC-8 is a character set that is unknown to the operating
> >>> system.  Your best bet is to convert MARC-8-encoded records into UTF-8.
> >>>
> >>> /me throws his hands up in the air and screams!
> >>>
> >>> Okay. How do I go about converting MARC-8 encoded records into UTF-8? I
> >> know
> >>> yaz-marcdump changes the encoding bit in MARC leaders. Does it also
> >> convert
> >>> MARC-8 characters to UTF-8? (I guess I could simply try it and see what
> >>> happens.)
> >>>
> >>> --
> >>> Eric Morgan