> But I had no idea Marc8 allowed escape sequences to temporarily switch > to a different encoding. Really? Oh my god. For you young'uns that were "born Unicode" and are a bit foggy on the MARC-8 environment (and all its... intricacies), I did a short write-up a few years ago: Coded Character Sets > A Technical Primer for Librarians http://rocky.uta.edu/doran/charsets/ Feel free to skip the intro, but the "MARC-8" and "MARC Unicode" sections are short and worth a read. Plus there's a lot of bonus stuff, including "Resources on the Web" (http://rocky.uta.edu/doran/charsets/resources.html) with an emphasis on library automation and the internet environment. Begging your pardon for the self-promotion, -- Michael > -----Original Message----- > From: Jonathan Rochkind [mailto:[log in to unmask]] > Sent: Monday, October 24, 2011 2:14 PM > To: Code for Libraries > Cc: Doran, Michael D > Subject: Re: [CODE4LIB] marc-8 > > Yeah, but if there's Perl code and Java code to do it, can't be _that_ > hard to port to ruby.... if I could figure out what you need to do to > get first-class char encoding support in ruby 1.9 anyway. > > I mean, you could do it just as a library without that... but it's > enough trouble that, yeah, I don't want to do it, but if the benefit was > first-class encoding support same as any other encoding in ruby 1.9, > that you can use with the built in tools for converting encodings and > any library that uses em.... bigger benefit. > > But I had no idea Marc8 allowed escape sequences to temporarily switch > to a different encoding. Really? Oh my god. > > On 10/24/2011 3:10 PM, Doran, Michael D wrote: > > Hi Jonathan, > > > >> I tried to figure out how to custom add a new encoding to ruby 1.9 with > >> the idea of adding Marc8 as an actuall ruby 1.9 character encoding > >> supported same as any other built in char encoding > > Not a trivial undertaking. Remember that the MARC-8 environment allows > alternate character sets to be invoked within a MARC record using two > different "escape" methods [1]. Just one of the reasons why you're not > finding a bunch of these MARC-8 conversion modules, and one for every > language. ;-) > > > > -- Michael > > > > [1] Technique 1 is unique to MARC-8 and provides access to a small number > of Greek symbols, subscripts, and superscripts. Technique 2 is based on the > ANSI X3.41 (ISO 2022) "Code Extension Techniques for Use with 7-bit and 8- > bit Character Sets" standard. See the MARC 21 Specification for details on > accessing alternate graphic character sets > (http://www.loc.gov/marc/specifications/speccharmarc8.html#alternative). > > > > > >> -----Original Message----- > >> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of > >> Jonathan Rochkind > >> Sent: Monday, October 24, 2011 2:01 PM > >> To: [log in to unmask] > >> Subject: Re: [CODE4LIB] marc-8 > >> > >> What _ought_ to be easiest of all is getting our ILS's to NEVER export > >> Marc8 _ever_ again. UTF8 only. > >> > >> Sadly, that only ought to be easiest. > >> > >> But IMO there's no reason any of us should be dealing with Marc8 ever > >> again. The only thing that should deal in Marc8 is an ILS, and should > >> only input it, NEVER output it, UTF8 only, please! > >> > >> But this is not the world we live in. > >> > >> I tried to figure out how to custom add a new encoding to ruby 1.9 with > >> the idea of adding Marc8 as an actuall ruby 1.9 character encoding > >> supported same as any other built in char encoding, but I couldn't > >> figure out if that was possible or how to do it. If it was possible to > >> do at that low level in ruby 1.9, it might justify the time to do it. > >> > >> On 10/24/2011 2:55 PM, Doran, Michael D wrote: > >>> Eric, > >>> > >>> Sometimes for grandpa Perl stuff -- especially as concerns charsets > and/or > >> internationalization -- it's worth pinging these lists: > >>> [log in to unmask] (yes, still alive and kicking) > >>> > >>> [log in to unmask] (very low traffic list, but some knowledgeable > >> subscribers) > >>> -- Michael > >>> > >>>> -----Original Message----- > >>>> From: Doran, Michael D > >>>> Sent: Monday, October 24, 2011 1:48 PM > >>>> To: 'Code for Libraries' > >>>> Subject: RE: [CODE4LIB] marc-8 > >>>> > >>>>> Okay. How do I go about converting MARC-8 encoded records into UTF-8? > >>>> In Perl... using the handy MARC::Charset module (tip 'o the hat to Ed > >>>> Summers, and now maintained by Galen Charlton). > >>>> > >>>> -- Michael > >>>> > >>>>> -----Original Message----- > >>>>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf > Of > >>>> Eric > >>>>> Lease Morgan > >>>>> Sent: Monday, October 24, 2011 1:39 PM > >>>>> To: [log in to unmask] > >>>>> Subject: Re: [CODE4LIB] marc-8 > >>>>> > >>>>> On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote: > >>>>> > >>>>>>> In Perl, how do I specify MARC-8 when reading (decoding) and writing > >>>>>>> (encoding) data? > >>>>>> You can't. MARC-8 is a character set that is unknown to the > operating > >>>>> system. Your best bet is to convert MARC-8-encoded records into UTF- > 8. > >>>>> > >>>>> /me throws his hands up in the air and screams! > >>>>> > >>>>> Okay. How do I go about converting MARC-8 encoded records into UTF-8? > I > >>>> know > >>>>> yaz-marcdump changes the encoding bit in MARC leaders. Does it also > >>>> convert > >>>>> MARC-8 characters to UTF-8? (I guess I could simply try it and see > what > >>>>> happens.) > >>>>> > >>>>> -- > >>>>> Eric Morgan