You could theoretically use Solr synonyms to expand the actual sharp
character (♯) to BOTH "#" AND "sharp". At index time. I guess at query
time you'd need to expand it to just one or the other -- I think
expanding to two things at query time is going to be a mess. I haven't
tried this myself, tried using Solr synonyms to expand something to more
than one alternative. I think Solr synonym analyzer supports it, but I
expect, based on what I know of how Solr works, that there will be some
gotchas.
I _do_ notice actual sharp and flat symbols in my library MARC data for
musical pieces, catalogers apparently do enter them sometimes. As most
users probably don't know how to (or won't think to) enter sharp and
flat characters directly, if it's important that these titles be
findable including the sharp/flat part, it seems like something has to
be done. But I haven't gotten to it yet. (Unless maybe all these library
records already have alternate titles listed in 246 or whatever using
straight ascii of some kind, I don't know).
In general, I've been able to avoid having to expand to multiple
synonyms -- but cant' really do that with ♯, #, 'sharp', I think,
precisely because '#' is not always a sharp sign, it can be other things
too, so you don't want to collapse all....
Wait, maybe just map ♯ to "#"? At both query and index time. Then user
can't search for "F sharp", but they can search for either "F♯" or "F#",
and both will match original source "F♯". That seems the simplest
solution. Although it would still be neat to play around with synonym
expansion to see if you can make "F sharp" at query time match too.
On 5/31/2011 12:05 PM, Thomas Dowling wrote:
> Many thanks.
>
> I like the idea of catching the sharp and flat symbols - the only problem
> is that lazy music students tend to use "#" and "b". ("Concerto in F#
> minor for Bb Bass Clarinet").
>
> Thomas
>
>
> On 05/31/2011 11:59 AM, Jonathan Rochkind wrote:
>> Multi-word synonyms are tricky.
>>
>> You probably want to make sure this synonym is only expanded at index
>> time, and not at search time. See some background in the
>> SynonymFilterFactory section of
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>>
>> I think the synonym approach is a fine way to search for greek letters by
>> name; it's possible some of the new Unicode stuff in Solr 3.1 might expand
>> greek letters too, but I think actually probably not (because you don't
>> neccesarily want that in the general case), I think synonyms is probably
>> your best bet. (Same for things like expanding the musical sharp or flat
>> glyph to "sharp" or "flat", which I've considered).
>>
|