Has anyone already given some thought into refining the solr stopwords.txt for library collections, particularly finding aids? The words included in the out of the box stopwords.txt are of very questionable unimportance:
<an and are as at be but by for if in into is it not of on or s such t that the their then there these they this to was will with>
We were indexing a field id with "no." as one of its tokens (for number), but wanted a query with "no" (where the person did not add the period) to find the doc, but in actuality the "no" would get stripped by the StopFilterFactory. And thus we stumbled upon this list, and was a bit suprised by some of the inclusions (ex:"will"), and exclusions( ex:"a").
Thanks,
Eric James
Yale University Libraries
|