I've always preferred search engine-based spell checkers over other approaches. I've not seen a library application using a different strategy (dictionary or corpus based) that does nearly as well.
We've used the Yahoo [1] and Bing [2] spell check APIs for years now in our applications. They used to be free (Microsoft just ended that last month). But even now they are very reasonably priced (e.g., Yahoo charges $0.10 per 1,000 queries), and well worth it in my experience.
The only drawback is that they will suggest corrections that can result in zero hits in your application, especially if you are using it for a small collection like a local catalog. You can mitigate that by doing a quick pre-check for hits before showing the suggestion to users.
--Dave
[1] http://developer.yahoo.com/search/boss/
[2] http://www.bing.com/developers/
-------------------------
David Walker
Interim Director, Systemwide Digital Library Services
California State University
562-355-4845
-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Jonathan Rochkind
Sent: Thursday, September 06, 2012 6:45 AM
To: [log in to unmask]
Subject: Re: [CODE4LIB] U of Baltimore, Final Usability Report, link resolvers -- MIA?
Solr has a feature to make spelling suggestions based on the actual terms in the corpus... but it's hardly a panacea. A straightforward naive implementation of the Solr feature, on top of a large library catalog corpus, in many of our experiences still gives odd and unuseful suggestions (including sometimes suggesting typos from the corpus, or suggesting taking an already 'correct' word and suggesting a different entirely different but lexicographically similar word as a 'correction'). And then there's figuring out the right UI (and managing to make it work on top of the Solr feature) for multi-term querries where each independent part may or may not have a 'correction'.
Turns out spell suggestions is kind of hard. And it's kind of amazing that google does it so well (and they use some fairly complex techniques to do so, I think, based on a whole bunch of data and metadata they have including past searches and clickthroughs, not just the corpus).
________________________________________
From: Code for Libraries [[log in to unmask]] on behalf of Ross Singer [[log in to unmask]]
Sent: Thursday, September 06, 2012 9:37 AM
To: [log in to unmask]
Subject: Re: [CODE4LIB] U of Baltimore, Final Usability Report, link resolvers -- MIA?
On Thu, Sep 6, 2012 at 9:06 AM, Cindy Harper <[log in to unmask]> wrote:
> I was going to comment that some of the Encore shortcomings mentioned
> in the PDf do seem to be addressed in current Encore versions,
> although some of these issues have to be addressed - for instance,
> there is a spell-check, but it can give some surprising suggestions,
> though suggestions do clue the user in to the fact that they might
> have a misspelling/typo.
I wrote about the woeful state of "spelling suggestions" a couple of years ago (among a lot of other things):
http://www.inthelibrarywiththeleadpipe.org/2009/were-gonna-geek-this-mother-out/
(you can skip on down to the "In the Absence of Suggestion, There is Always Search..." - it's pretty TL;DR-worthy)
Basically, the crux of it is, as long as spelling suggestions are based on standard dictionaries and not built /on the actual terms and phrases in the collection/ it's going to basically be a worthless feature.
I do note there, though, that BiblioCommons apparently must build their dictionaries on the metadata in the system.
-Ross.
>
> III's reaction to studies that report that users ignore the right-side
> panel of search options was to provide a skin that has only two
> columns - the facets on the left, and the search results on the middle-to-right.
> This pushes important facets like the tag cloud very far down the
> page, and causes a lot of scrolling, so I don't like this skin much.
>
> I recently asked a question on the encore users' list about how the
> tag cloud could be improved - currently it suggests the most common
> subfield a of the subject headings. I would think it should include
> the general, chronological, geographical subdivisions - subfields
> x,y,z. For instance, it doesn't provide good suggestions for improving the search "civil war"
> without these. A chronological subdivision would help a lot there.
> But then again, I haven't seen a prototype of how many relevant
> subdivisions this would result in - would the subdivisions drown out
> the main headings in the tag cloud?
>
> Cindy Harper, Systems Librarian
> Colgate University Libraries
> [log in to unmask]
> 315-228-7363
>
>
>
> On Wed, Sep 5, 2012 at 5:30 PM, Jonathan LeBreton <[log in to unmask]>wrote:
>
>> Lucy Holman, Director of the U Baltimore Library, and a former
>> colleague of mine at UMBC, got back to me about this. Her reply puts this
>> particular document into context. It is an interesting reminder that not
>> everything you find on the web is as it seems, and it certainly is not
>> necessarily the final word. We gotta go buy the book!
>> Lucy is off-list, but asked me to post this on her behalf.
>> Her contact information is below, though....
>>
>> Very interesting discussion This issue of what is right and feasible
>> in discovery services and how to configure it is important stuff for
>> many of our libraries and we should be able to build on the findings and
>> experiences of others rather than re-inventing the wheel locally.... (We
>> use Summon)
>>
>> - Jonathan LeBreton
>>
>>
>> ------------------------ begin Lucy's explanation --------------
>>
>> The full study and analysis are included in Chapter 14 of a new book,
>> Planning and Implementing Resource Discovery Tools in Academic
>> Libraries, Mary P. Popp and Diane Dallis (Eds).
>>
>> The project was part of a graduate Research Methods course in the
>> University of Baltimore's MS in Interaction Design and Information
>> Architecture program. Originally groups within the course conducted
>> task-based usability tests on EDS, Primo, Summon and Encore.
>> Unfortunately, the test environment of Encore led to many usability
>> issues that we believed were more a result of the test environment
>> than the product itself; therefore we did not report on Encore in the
>> final analysis. The study (and chapter) does offers findings on the
>> other three discovery tools.
>>
>> There were six student groups in the course; each group studied two
>> tools with the same user population (undergrad, graduate and faculty)
>> so that each tool was compared against the other three with each user
>> population overall. The .pdf that you found was the final report of
>> one of those six groups, so it only addresses two of the four tools.
>> The chapter is the only document that pulls the six portions of the study together.
>>
>> I would be happy to discuss this with any of you individually if you
>> need more information.
>>
>> Thanks for your interest in the study.
>>
>>
>> Lucy Holman, DCD
>> Director, Langsdale Library
>> University of Baltimore
>> 1420 Maryland Avenue
>> Baltimore, MD 21201
>> 410-837-4333
>>
>> ------------------------- end insert --------------------
>>
>> Jonathan LeBreton
>> Sr. Associate University Librarian
>> Temple University Libraries
>> Paley M138, 1210 Polett Walk, Philadelphia PA 19122
>> voice: 215-204-8231
>> fax: 215-204-5201
>> mobile: 215-284-5070
>> email: [log in to unmask]
>> email: [log in to unmask]
>>
>>
>> > -----Original Message-----
>> > From: Code for Libraries [mailto:[log in to unmask]] On
>> > Behalf Of karim boughida
>> > Sent: Tuesday, September 04, 2012 5:09 PM
>> > To: [log in to unmask]
>> > Subject: Re: [CODE4LIB] U of Baltimore, Final Usability Report,
>> > link
>> resolvers --
>> > MIA?
>> >
>> > Hi Tom,
>> > Top players are EDS, Primo and Summon....the only reason I see
>> > encore in
>> the
>> > mix is if you have other III products which is not the case of
>> > Ubalt
>> library. They
>> > have now worldcat? Encore vs Summon is an easy win for summon.
>> >
>> > Let's wait for Jonathan LeBreton (Thanks BTW).
>> >
>> > Karim Boughida
>> >
>> > On Tue, Sep 4, 2012 at 4:26 PM, Tom Pasley <[log in to unmask]> wrote:
>> > > Yes, I'm curious to know too! Due to database/resource matching
>> > > or coverage perhaps (anyone's guess).
>> > >
>> > > Tom
>> > >
>> > > On Wed, Sep 5, 2012 at 7:50 AM, karim boughida
>> > > <[log in to unmask]>
>> > wrote:
>> > >
>> > >> Hi All,
>> > >> Initially EDS, Primo, Summon, and Encore were considered but
>> > >> only Encore and Summon were tested. Do we know why?
>> > >>
>> > >> Thanks
>> > >> Karim Boughida
>> > >>
>> > >>
>> > >> On Tue, Sep 4, 2012 at 10:44 AM, Jonathan Rochkind
>> > >> <[log in to unmask]>
>> > >> wrote:
>> > >> > Hi helpful code4lib community, at one point there was a report
>> online at:
>> > >> >
>> > >> >
>> > >> http://student-iat.ubalt.edu/students/kerber_n/idia642/Final_Usa
>> > >> bilit
>> > >> y_Report.pdf
>> > >> >
>> > >> > David Walker tells me the report at that location included
>> > >> > findings about SFX and/or other link resolvers.
>> > >> >
>> > >> > I'm really interested in reading it. But it's gone from that
>> > >> > location,
>> > >> and
>> > >> > I'm not sure if it's somewhere else (I don't have a
>> > >> > title/author to
>> > >> search
>> > >> > for other than that URL, which is not in google cache or
>> > >> > internet
>> > >> archive).
>> > >> >
>> > >> > Is anyone reading this familiar with the report? Perhaps one
>> > >> > of the
>> > >> authors
>> > >> > is reading this, or someone reading it knows one of the
>> > >> > authors and can
>> > >> be
>> > >> > put me in touch? Or knows someone likely in the relevant dept
>> > >> > at ubalt
>> > >> and
>> > >> > can be put me in touch? Or has any other information about
>> > >> > this report or ways to get it?
>> > >> >
>> > >> > Thanks!
>> > >> >
>> > >> > Jonathan
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Karim B Boughida
>> > >> [log in to unmask]
>> > >> [log in to unmask]
>> > >>
>> >
>> >
>> >
>> > --
>> > Karim B Boughida
>> > [log in to unmask]
>> > [log in to unmask]
>>
|