> But I had no idea Marc8 allowed escape sequences to temporarily switch
> to a different encoding. Really? Oh my god.
For you young'uns that were "born Unicode" and are a bit foggy on the MARC-8 environment (and all its... intricacies), I did a short write-up a few years ago:
Coded Character Sets > A Technical Primer for Librarians
http://rocky.uta.edu/doran/charsets/
Feel free to skip the intro, but the "MARC-8" and "MARC Unicode" sections are short and worth a read. Plus there's a lot of bonus stuff, including "Resources on the Web" (http://rocky.uta.edu/doran/charsets/resources.html) with an emphasis on library automation and the internet environment.
Begging your pardon for the self-promotion,
-- Michael
> -----Original Message-----
> From: Jonathan Rochkind [mailto:[log in to unmask]]
> Sent: Monday, October 24, 2011 2:14 PM
> To: Code for Libraries
> Cc: Doran, Michael D
> Subject: Re: [CODE4LIB] marc-8
>
> Yeah, but if there's Perl code and Java code to do it, can't be _that_
> hard to port to ruby.... if I could figure out what you need to do to
> get first-class char encoding support in ruby 1.9 anyway.
>
> I mean, you could do it just as a library without that... but it's
> enough trouble that, yeah, I don't want to do it, but if the benefit was
> first-class encoding support same as any other encoding in ruby 1.9,
> that you can use with the built in tools for converting encodings and
> any library that uses em.... bigger benefit.
>
> But I had no idea Marc8 allowed escape sequences to temporarily switch
> to a different encoding. Really? Oh my god.
>
> On 10/24/2011 3:10 PM, Doran, Michael D wrote:
> > Hi Jonathan,
> >
> >> I tried to figure out how to custom add a new encoding to ruby 1.9 with
> >> the idea of adding Marc8 as an actuall ruby 1.9 character encoding
> >> supported same as any other built in char encoding
> > Not a trivial undertaking. Remember that the MARC-8 environment allows
> alternate character sets to be invoked within a MARC record using two
> different "escape" methods [1]. Just one of the reasons why you're not
> finding a bunch of these MARC-8 conversion modules, and one for every
> language. ;-)
> >
> > -- Michael
> >
> > [1] Technique 1 is unique to MARC-8 and provides access to a small number
> of Greek symbols, subscripts, and superscripts. Technique 2 is based on the
> ANSI X3.41 (ISO 2022) "Code Extension Techniques for Use with 7-bit and 8-
> bit Character Sets" standard. See the MARC 21 Specification for details on
> accessing alternate graphic character sets
> (http://www.loc.gov/marc/specifications/speccharmarc8.html#alternative).
> >
> >
> >> -----Original Message-----
> >> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> >> Jonathan Rochkind
> >> Sent: Monday, October 24, 2011 2:01 PM
> >> To: [log in to unmask]
> >> Subject: Re: [CODE4LIB] marc-8
> >>
> >> What _ought_ to be easiest of all is getting our ILS's to NEVER export
> >> Marc8 _ever_ again. UTF8 only.
> >>
> >> Sadly, that only ought to be easiest.
> >>
> >> But IMO there's no reason any of us should be dealing with Marc8 ever
> >> again. The only thing that should deal in Marc8 is an ILS, and should
> >> only input it, NEVER output it, UTF8 only, please!
> >>
> >> But this is not the world we live in.
> >>
> >> I tried to figure out how to custom add a new encoding to ruby 1.9 with
> >> the idea of adding Marc8 as an actuall ruby 1.9 character encoding
> >> supported same as any other built in char encoding, but I couldn't
> >> figure out if that was possible or how to do it. If it was possible to
> >> do at that low level in ruby 1.9, it might justify the time to do it.
> >>
> >> On 10/24/2011 2:55 PM, Doran, Michael D wrote:
> >>> Eric,
> >>>
> >>> Sometimes for grandpa Perl stuff -- especially as concerns charsets
> and/or
> >> internationalization -- it's worth pinging these lists:
> >>> [log in to unmask] (yes, still alive and kicking)
> >>>
> >>> [log in to unmask] (very low traffic list, but some knowledgeable
> >> subscribers)
> >>> -- Michael
> >>>
> >>>> -----Original Message-----
> >>>> From: Doran, Michael D
> >>>> Sent: Monday, October 24, 2011 1:48 PM
> >>>> To: 'Code for Libraries'
> >>>> Subject: RE: [CODE4LIB] marc-8
> >>>>
> >>>>> Okay. How do I go about converting MARC-8 encoded records into UTF-8?
> >>>> In Perl... using the handy MARC::Charset module (tip 'o the hat to Ed
> >>>> Summers, and now maintained by Galen Charlton).
> >>>>
> >>>> -- Michael
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf
> Of
> >>>> Eric
> >>>>> Lease Morgan
> >>>>> Sent: Monday, October 24, 2011 1:39 PM
> >>>>> To: [log in to unmask]
> >>>>> Subject: Re: [CODE4LIB] marc-8
> >>>>>
> >>>>> On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote:
> >>>>>
> >>>>>>> In Perl, how do I specify MARC-8 when reading (decoding) and writing
> >>>>>>> (encoding) data?
> >>>>>> You can't. MARC-8 is a character set that is unknown to the
> operating
> >>>>> system. Your best bet is to convert MARC-8-encoded records into UTF-
> 8.
> >>>>>
> >>>>> /me throws his hands up in the air and screams!
> >>>>>
> >>>>> Okay. How do I go about converting MARC-8 encoded records into UTF-8?
> I
> >>>> know
> >>>>> yaz-marcdump changes the encoding bit in MARC leaders. Does it also
> >>>> convert
> >>>>> MARC-8 characters to UTF-8? (I guess I could simply try it and see
> what
> >>>>> happens.)
> >>>>>
> >>>>> --
> >>>>> Eric Morgan
|