LISTSERV 16.5 - CODE4LIB Archives

By sending this here I hope I'm going to hit everyone on the  
blacklight, vufind, and solrmarc mailing lists, and maybe some other  
interested parties.

Our East Asian Languages Librarian has approached me with a problem he  
wants to see solved. According to him, the typical North American  
library cataloging rules for constructing Pinyin transliterations are  
different from the rules that are used in China. What this means is  
that native Chinese speakers have a lot of trouble searching our  
catalog (it is "practically unusable" was his exact quote). His  
proposal, and I think it's a good one, is that since we're re-indexing  
our records into solr anyway, we could apply at index time an  
algorithm to convert North American Pinyin to Chinese rules Pinyin,  
index both values, and thus make the catalog much more useful to an  
under-served population. This seems like a great suggestion to me, but  
before I start devoting development cycles to it I wanted to poll the  
community... is there a more obvious answer that I'm not seeing? Has  
anyone solved this already?

What's the right place for such a piece of code? Solrmarc seems the  
obvious place to me. As it has been described to me so far, this  
doesn't seem like an issue affecting people outside the library realm,  
which makes it seem too niche and community-specific to get it built  
into the lucene codebase, but I could be wrong about that. Maybe it  
would be better as a lucene contrib library?

So, thoughts? Anyone know more about this than I do and want to speak  
up?

Thanks!

Bess