Print

Print


We had stemming on for authors at first (maybe was the VUFind default way
back when?) and turned it off as soon as we noticed. The initial complaint
was that searching on "Rowles" gave records for "Rowling." and of course
it's not hard to find other examples, esp. with the -ing suffix.

On Mon, Jun 13, 2011 at 8:08 PM, Jonathan Rochkind <[log in to unmask]> wrote:

> In a Solr-based search, stemming is done at indexing time, into fields with
> stemmed tokens.
>
> It seems typical in library-catalog type applications based on Solr to have
> the default (or even only) searches be over these stemmed fields, thus
> 'auto-stemming' to the user. (Search for 'monkey', find 'monkeys' too, and
> vice versa).
>
> I am curious how many people, who have Solr based catalogs (that is, I'm
> interested in people who have search engines with majority or only content
> originally from MARC), use such stemmed fields ('auto-stemming') over their
> _author_ fields as well.
>
> There are pro's and con's to this. There are certainly some things in an
> author field that would benefit from stemming (mostly various kinds of
> corporate authors, some of whose endings end up looking like english
> language phrases). There are also very many things in an author field that
> would not benefit from stemming, and thus when stemming is done it
> sometimes(/often?) results in false matches, "pluralizing" an author's last
> name in an inappropriate way for instance.
>
> So, wanna say on the list, if you are using a Solr-based catalog, are you
> using stemmed fields for your author searches? Curious what people end up
> doing.  If there are any other more complicated clever things you've done
> than just stem-or-not, let us know that too!
>
> Jonathan
>



-- 
Bill Dueber
Library Systems Programmer
University of Michigan Library