I'm about to embark on trying to write code to apply NACO normalization to
strings (not for field-to-field comparisons, but for correctly sorting
things). I was drivin to this by a complaint about how some Arabic
manuscript titles are sorting.
My end goal is a Solr filter, so I'm most interested in Java code.
It doesn't look "hard" so much as "long and error-prone" so I'm hoping
someone has already done this (or at least has a character map that I can
easily convert to java).
I've seen the code at the
OCLC<http://www.oclc.org/research/activities/naco/default.htm>,
but it's 10 years old and doesn't have a lot of the non-latin stuff in it.
Evergreen has a perl
implementation<http://git.evergreen-ils.org/?p=Evergreen.git;a=blob;f=Open-ILS/src/perlmods/lib/OpenILS/Utils/Normalize.pm>:
that's probably where I'll start if no one has anything else.
Anyone?
--
Bill Dueber
Library Systems Programmer
University of Michigan Library
|