By sending this here I hope I'm going to hit everyone on the
blacklight, vufind, and solrmarc mailing lists, and maybe some other
interested parties.
Our East Asian Languages Librarian has approached me with a problem he
wants to see solved. According to him, the typical North American
library cataloging rules for constructing Pinyin transliterations are
different from the rules that are used in China. What this means is
that native Chinese speakers have a lot of trouble searching our
catalog (it is "practically unusable" was his exact quote). His
proposal, and I think it's a good one, is that since we're re-indexing
our records into solr anyway, we could apply at index time an
algorithm to convert North American Pinyin to Chinese rules Pinyin,
index both values, and thus make the catalog much more useful to an
under-served population. This seems like a great suggestion to me, but
before I start devoting development cycles to it I wanted to poll the
community... is there a more obvious answer that I'm not seeing? Has
anyone solved this already?
What's the right place for such a piece of code? Solrmarc seems the
obvious place to me. As it has been described to me so far, this
doesn't seem like an issue affecting people outside the library realm,
which makes it seem too niche and community-specific to get it built
into the lucene codebase, but I could be wrong about that. Maybe it
would be better as a lucene contrib library?
So, thoughts? Anyone know more about this than I do and want to speak
up?
Thanks!
Bess
|