On Nov 27, 2006, at 5:04 PM, Jonathan Rochkind wrote: > Bess Sadler wrote: >> application. That way you can use solr / lucene for search, faceted >> browse, etc, and your XML database only for known item retrieval, >> which it is generally able to do without performance issues. I'm >> hopping up and down waiting for someone to take this approach with an >> ILS, so please come and show us what you've got! >> > Would this approach complicate hilighting of hits-in-context? One of > the biggest things missing from most current OPACs in my opinion is > google-style excerpting of WHAT part of the record matched the > query--on > the results page. Many mainstream OPACs do currently provide some form > of hilighting on the detail/full-bib page, but it's not generally > truly > identifying _which_ parts of the record _actually_ matched your search > (a search just on title will still hilight the word found in a non- > title > field), which I find annoying. > > Do these kind of hybrid approaches complicate the task of providing > proper result hilighting in context, or am I off on the wrong > direction? Highlighting is tricky business all the way around. XTF seems to be the best solution I've seen for very detailed context highlighting. But I suspect ILS systems don't need that level of complexity but rather could leverage Solr's highlighting capabilities by ensuring that the specific fields that need highlighting are stored. Solr's highlighter (which is Lucene's contrib Highlighter under the covers) does do field-specific highlighting, but it still is not perfect. For example, if you searched for title:"Blessed Damozel", it would highlight "blessed" and "damozel" anywhere in the title field even though the query is a phrase query where proximity matters. For a proprietary contract job, I have written code that converts a general Lucene Query into a SpanQuery and a highlighter that does precise highlighting. Beyond this code being proprietary, such that I cannot share it, it is also not general purpose and does full field highlighting, not scoring fragments like the Lucene highlighter does. The approach to converting to a SpanQuery is a good way to go though, and has been discussed a bit in the Lucene e-mail list. In short, I think the hybrid approach is still a good one, separating the search engine from the actual data repository, but highlighting requirements need to be considered up front. Basic and decent field- specific highlighting can be achieved with Solr, but its got caveats. Erik