LISTSERV 16.5 - CODE4LIB Archives

Very interesting, Ralph. Are you / OCLC offering that code under any particular license(s)? 

(The Evergreen code, for what it's worth, has a project-level license stating that Evergreen code is offered under the GPL v2 with the "or later" clause).

>>> "LeVan,Ralph" <[log in to unmask]> 4/11/2012 12:04 PM >>>
I'm pretty sure attachments don't work on the list, so I'm just pasting
my NACO normalizer below.  Note that there are 2007 versions of the
normalize() method in there.  This is used for all the VIAF and
Identities indexing.

Ralph

/*
* NacoNormalize.java
*
* Created on July 11, 2007, 10:52 AM
*
* To change this template, choose Tools | Template Manager
* and open the template in the editor.
*/

package ORG.oclc.util;

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
Bill Dueber
Sent: Wednesday, April 11, 2012 11:27 AM
To: [log in to unmask]
Subject: Modern NACO Normalization (esp. in java?)

I'm about to embark on trying to write code to apply NACO normalization
to
strings (not for field-to-field comparisons, but for correctly sorting
things). I was drivin to this by a complaint about how some Arabic
manuscript titles are sorting.

My end goal is a Solr filter, so I'm most interested in Java code.

It doesn't look "hard" so much as "long and error-prone" so I'm hoping
someone has already done this (or at least has a character map that I
can
easily convert to java).

I've seen the code at the
OCLC<http://www.oclc.org/research/activities/naco/default.htm>,
but it's 10 years old and doesn't have a lot of the non-latin stuff in
it.

Evergreen has a perl
implementation<http://git.evergreen-ils.org/?p=Evergreen.git;a=blob;f=Op
en-ILS/src/perlmods/lib/OpenILS/Utils/Normalize.pm>:
that's probably where I'll start if no one has anything else.

Anyone?


--
Bill Dueber
Library Systems Programmer
University of Michigan Library