LISTSERV 16.5 - CODE4LIB Archives

Yitzchak:

This is great news! I'd love to see you share the code with the
greater community. This may prove particularly useful for the
automated addition of non-Roman data into authority records for NACO
members (see [1]; see also [2]). As far as other algorithms go, you
could try getting in touch with Dave Reser at LC's CPSO. You may also
want to look at IITM's "Local Language Editor" [3].

[1] http://www.loc.gov/catdir/cpso/nonlatinfaq.html
[2] http://www.loc.gov/catdir/cpso/nonlatin.pdf
[3] http://acharya.iitm.ac.in/software/r2leditor.php

Mark A. Matienzo
Applications Developer, NYPL Labs
The New York Public Library
+1 (212) 592-7176

On Fri, Aug 15, 2008 at 10:32 AM, Yitzchak Schaffer <[log in to unmask]> wrote:
> BS"D
>
> Greetings all:
>
> It occurs to me now that I might have checked for existing work on the lists
> before I did this, but anyway -- we are in the finishing stages of creating
> scripts that will automatically convert a library's existing Romanized MARC
> Hebrew fields (e.g. "Sefer {dotb}Hatan Torah") into Hebrew-script, and add
> them to the records already in the ILS.  It's quite accurate; not
> bulletproof, but at least it's a way to quickly get Hebrew script into
> thousands of Roman-only records, where many Hebrew users (including staff)
> may not understand the transliteration rules 100%.
>
> The Hebrew conversion itself is done by a PHP script (haven't finished
> learning Perl) acting on a MARC dump of Roman-only Hebrew records in MRK
> (broken MARCedit) format.  This outputs two files of converted fields: an
> XML file for proofing, and a tab-delimited text file for the inputting
> script to devour.  This inputting is done by an Expect script using the
> character-based ILS client.
>
> We are an III shop.  This could presumably be adapted easily enough for
> another ILS, whether using Expect or direct manipulation of database tables.
>  (I'm not volunteering, though...) It would probably be easy enough to adapt
> to another language also, assuming that language were at least as
> predictable in MARC as Hebrew.  (It's pretty good - my list of "manual
> override" words that the auto-algorithm botches is now totaling about 35 in
> preliminary testing.)
>
> Note that I can't imagine automating the other direction, Hebrew- to
> Roman-script, unless there's some algorithm for this already floating around
> out there.
>
> If anyone's interested, I'll clean up the code and open-source it.
>
> Cheers, Shabbat shalom,
>
> --
> Yitzchak Schaffer
> Systems Librarian
> Touro College Libraries
> 33 West 23rd Street
> New York, NY 10010
> Tel (212) 463-0400 x5230
> Fax (212) 627-3197
> [log in to unmask]
>