LISTSERV 16.5 - CODE4LIB Archives

Thanks, Erik, there is no specific reason for their removal, I think this was just that the StopFilterFactory is preconfigured in the analyzer chain for fieldType=text.  We will do some performance testing with this filter removed.

 

BTW, a useful tool in deciding appropriate stopwords is the schema browser, which can be found on the /solr/admin page.  Here you can see term frequencies for each of the fields sorted from highest frequency to help weed out the terms of little querying value.

 

Eric  
 
> Date: Thu, 12 Nov 2009 09:06:46 -0500
> From: [log in to unmask]
> Subject: Re: [CODE4LIB] solr | StopFilterFactory - stopwords.txt
> To: [log in to unmask]
> 
> I often recommend against stop word removal altogether. Is there any 
> reason you need to remove them?
> 
> The primary reason stop words get removed is to increase performance 
> of queries with very common terms. If you are encountering that, 
> using Solr's CommonGramsFilter(Factory) is a good solution to keep 
> your stop words and alleviate the performance degradation potential. 
> The HathiTrust folks have had success with the common grams capability.
> 
> Erik
> 
> 
> On Nov 11, 2009, at 3:41 PM, Eric James wrote:
> 
> > Has anyone already given some thought into refining the solr 
> > stopwords.txt for library collections, particularly finding aids? 
> > The words included in the out of the box stopwords.txt are of very 
> > questionable unimportance:
> >
> > <an and are as at be but by for if in into is it not of on or s such 
> > t that the their then there these they this to was will with>
> >
> >
> >
> > We were indexing a field id with "no." as one of its tokens (for 
> > number), but wanted a query with "no" (where the person did not add 
> > the period) to find the doc, but in actuality the "no" would get 
> > stripped by the StopFilterFactory. And thus we stumbled upon this 
> > list, and was a bit suprised by some of the inclusions (ex:"will"), 
> > and exclusions( ex:"a").
> >
> >
> >
> > Thanks,
> >
> > Eric James
> >
> > Yale University Libraries
> >