On 10/24/2011 2:52 PM, Ross Singer wrote:
> On Mon, Oct 24, 2011 at 7:39 PM, Eric Lease Morgan<[log in to unmask]> wrote:
>
>> Okay. How do I go about converting MARC-8 encoded records into UTF-8? I know yaz-marcdump changes the encoding bit in MARC leaders. Does it also convert MARC-8 characters to UTF-8? (I guess I could simply try it and see what happens.)
>>
> Yes, it does. It uses yaz-iconv. Theoretically, you could wrap some
> Perl module around that. I've contemplated it for ruby-marc, but then
> it always seems a lot easier to ignore it and delete any emails that
> request it.
Or use jruby, where you can use Marc4J. Or actually port either the
Java or (apparently?) Perl version into ruby; okay that one is not
"easier" then anything in the short term, but in the long term I'd
rather have pure ruby that something that relies on an external bash
call or a C extension, those latter are invariably going to be annoying
and confusing maintenance down the line, in my experience.
But I'm not doing any of these things anytime soon either. So far all my
ruby that deals with Marc gets something else to convert it first. (In
my largest case, Java Marc4J converts it before it's stored in a stored
field in a Solr index, and my ruby only gets it from the stored field in
Solr, already converted).
|