Yeah, but if there's Perl code and Java code to do it, can't be _that_
hard to port to ruby.... if I could figure out what you need to do to
get first-class char encoding support in ruby 1.9 anyway.
I mean, you could do it just as a library without that... but it's
enough trouble that, yeah, I don't want to do it, but if the benefit was
first-class encoding support same as any other encoding in ruby 1.9,
that you can use with the built in tools for converting encodings and
any library that uses em.... bigger benefit.
But I had no idea Marc8 allowed escape sequences to temporarily switch
to a different encoding. Really? Oh my god.
On 10/24/2011 3:10 PM, Doran, Michael D wrote:
> Hi Jonathan,
>
>> I tried to figure out how to custom add a new encoding to ruby 1.9 with
>> the idea of adding Marc8 as an actuall ruby 1.9 character encoding
>> supported same as any other built in char encoding
> Not a trivial undertaking. Remember that the MARC-8 environment allows alternate character sets to be invoked within a MARC record using two different "escape" methods [1]. Just one of the reasons why you're not finding a bunch of these MARC-8 conversion modules, and one for every language. ;-)
>
> -- Michael
>
> [1] Technique 1 is unique to MARC-8 and provides access to a small number of Greek symbols, subscripts, and superscripts. Technique 2 is based on the ANSI X3.41 (ISO 2022) "Code Extension Techniques for Use with 7-bit and 8-bit Character Sets" standard. See the MARC 21 Specification for details on accessing alternate graphic character sets (http://www.loc.gov/marc/specifications/speccharmarc8.html#alternative).
>
>
>> -----Original Message-----
>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>> Jonathan Rochkind
>> Sent: Monday, October 24, 2011 2:01 PM
>> To: [log in to unmask]
>> Subject: Re: [CODE4LIB] marc-8
>>
>> What _ought_ to be easiest of all is getting our ILS's to NEVER export
>> Marc8 _ever_ again. UTF8 only.
>>
>> Sadly, that only ought to be easiest.
>>
>> But IMO there's no reason any of us should be dealing with Marc8 ever
>> again. The only thing that should deal in Marc8 is an ILS, and should
>> only input it, NEVER output it, UTF8 only, please!
>>
>> But this is not the world we live in.
>>
>> I tried to figure out how to custom add a new encoding to ruby 1.9 with
>> the idea of adding Marc8 as an actuall ruby 1.9 character encoding
>> supported same as any other built in char encoding, but I couldn't
>> figure out if that was possible or how to do it. If it was possible to
>> do at that low level in ruby 1.9, it might justify the time to do it.
>>
>> On 10/24/2011 2:55 PM, Doran, Michael D wrote:
>>> Eric,
>>>
>>> Sometimes for grandpa Perl stuff -- especially as concerns charsets and/or
>> internationalization -- it's worth pinging these lists:
>>> [log in to unmask] (yes, still alive and kicking)
>>>
>>> [log in to unmask] (very low traffic list, but some knowledgeable
>> subscribers)
>>> -- Michael
>>>
>>>> -----Original Message-----
>>>> From: Doran, Michael D
>>>> Sent: Monday, October 24, 2011 1:48 PM
>>>> To: 'Code for Libraries'
>>>> Subject: RE: [CODE4LIB] marc-8
>>>>
>>>>> Okay. How do I go about converting MARC-8 encoded records into UTF-8?
>>>> In Perl... using the handy MARC::Charset module (tip 'o the hat to Ed
>>>> Summers, and now maintained by Galen Charlton).
>>>>
>>>> -- Michael
>>>>
>>>>> -----Original Message-----
>>>>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>>>> Eric
>>>>> Lease Morgan
>>>>> Sent: Monday, October 24, 2011 1:39 PM
>>>>> To: [log in to unmask]
>>>>> Subject: Re: [CODE4LIB] marc-8
>>>>>
>>>>> On Oct 24, 2011, at 2:34 PM, Doran, Michael D wrote:
>>>>>
>>>>>>> In Perl, how do I specify MARC-8 when reading (decoding) and writing
>>>>>>> (encoding) data?
>>>>>> You can't. MARC-8 is a character set that is unknown to the operating
>>>>> system. Your best bet is to convert MARC-8-encoded records into UTF-8.
>>>>>
>>>>> /me throws his hands up in the air and screams!
>>>>>
>>>>> Okay. How do I go about converting MARC-8 encoded records into UTF-8? I
>>>> know
>>>>> yaz-marcdump changes the encoding bit in MARC leaders. Does it also
>>>> convert
>>>>> MARC-8 characters to UTF-8? (I guess I could simply try it and see what
>>>>> happens.)
>>>>>
>>>>> --
>>>>> Eric Morgan
|