Thanks, Erik, there is no specific reason for their removal, I think this was just that the StopFilterFactory is preconfigured in the analyzer chain for fieldType=text. We will do some performance testing with this filter removed. BTW, a useful tool in deciding appropriate stopwords is the schema browser, which can be found on the /solr/admin page. Here you can see term frequencies for each of the fields sorted from highest frequency to help weed out the terms of little querying value. Eric > Date: Thu, 12 Nov 2009 09:06:46 -0500 > From: [log in to unmask] > Subject: Re: [CODE4LIB] solr | StopFilterFactory - stopwords.txt > To: [log in to unmask] > > I often recommend against stop word removal altogether. Is there any > reason you need to remove them? > > The primary reason stop words get removed is to increase performance > of queries with very common terms. If you are encountering that, > using Solr's CommonGramsFilter(Factory) is a good solution to keep > your stop words and alleviate the performance degradation potential. > The HathiTrust folks have had success with the common grams capability. > > Erik > > > On Nov 11, 2009, at 3:41 PM, Eric James wrote: > > > Has anyone already given some thought into refining the solr > > stopwords.txt for library collections, particularly finding aids? > > The words included in the out of the box stopwords.txt are of very > > questionable unimportance: > > > > <an and are as at be but by for if in into is it not of on or s such > > t that the their then there these they this to was will with> > > > > > > > > We were indexing a field id with "no." as one of its tokens (for > > number), but wanted a query with "no" (where the person did not add > > the period) to find the doc, but in actuality the "no" would get > > stripped by the StopFilterFactory. And thus we stumbled upon this > > list, and was a bit suprised by some of the inclusions (ex:"will"), > > and exclusions( ex:"a"). > > > > > > > > Thanks, > > > > Eric James > > > > Yale University Libraries > >