Print

Print


I've put that code into Google code.  It's been tidied slightly and I
added our junit test for it as well.

http://code.google.com/p/oclcnaconormalizer/

Ralph

-----Original Message-----
From: LeVan,Ralph 
Sent: Wednesday, April 11, 2012 1:24 PM
To: [log in to unmask]
Subject: RE: Modern NACO Normalization (esp. in java?)

Apache 2.

To cover my butt, this code was originally released as part of our
SiteSearch product which we made Open Source.  This is just the latest
incarnation, but just as open.

Stick this at the top of that code, if you expect to reuse it.

Thanks for asking!

Ralph

/**
 * Copyright 2012 OCLC Online Computer Library Center, Inc.
 *
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 *  Unless required by applicable law or agreed to in writing, software
 *  distributed under the License is distributed on an "AS IS" BASIS,
 *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
 *  See the License for the specific language governing permissions and
 *  limitations under the License.
 *
 * The Original Code is ____NacoNormalize.java________.
 * The Initial Developer of the Original Code is __Ralph LeVan__.
 */


-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
Dan Scott
Sent: Wednesday, April 11, 2012 12:36 PM
To: [log in to unmask]
Subject: Re: Modern NACO Normalization (esp. in java?)

Very interesting, Ralph. Are you / OCLC offering that code under any
particular license(s)? 

(The Evergreen code, for what it's worth, has a project-level license
stating that Evergreen code is offered under the GPL v2 with the "or
later" clause).

>>> "LeVan,Ralph" <[log in to unmask]> 4/11/2012 12:04 PM >>>
I'm pretty sure attachments don't work on the list, so I'm just pasting
my NACO normalizer below.  Note that there are 2007 versions of the
normalize() method in there.  This is used for all the VIAF and
Identities indexing.

Ralph

/*
* NacoNormalize.java
*
* Created on July 11, 2007, 10:52 AM
*
* To change this template, choose Tools | Template Manager
* and open the template in the editor.
*/

package ORG.oclc.util;

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
Bill Dueber
Sent: Wednesday, April 11, 2012 11:27 AM
To: [log in to unmask]
Subject: Modern NACO Normalization (esp. in java?)

I'm about to embark on trying to write code to apply NACO normalization
to
strings (not for field-to-field comparisons, but for correctly sorting
things). I was drivin to this by a complaint about how some Arabic
manuscript titles are sorting.

My end goal is a Solr filter, so I'm most interested in Java code.

It doesn't look "hard" so much as "long and error-prone" so I'm hoping
someone has already done this (or at least has a character map that I
can
easily convert to java).

I've seen the code at the
OCLC<http://www.oclc.org/research/activities/naco/default.htm>,
but it's 10 years old and doesn't have a lot of the non-latin stuff in
it.

Evergreen has a perl
implementation<http://git.evergreen-ils.org/?p=Evergreen.git;a=blob;f=Op
en-ILS/src/perlmods/lib/OpenILS/Utils/Normalize.pm>:
that's probably where I'll start if no one has anything else.

Anyone?


--
Bill Dueber
Library Systems Programmer
University of Michigan Library