On Nov 27, 2006, at 5:04 PM, Jonathan Rochkind wrote:

> Bess Sadler wrote:
>> application. That way you can use solr / lucene for search, faceted
>> browse, etc, and your XML database only for known item retrieval,
>> which it is generally able to do without performance issues. I'm
>> hopping up and down waiting for someone to take this approach with an
>> ILS, so please come and show us what you've got!
> Would this approach complicate hilighting of hits-in-context?  One of
> the biggest things missing from most current OPACs in my opinion is
> google-style excerpting of WHAT part of the record matched the
> query--on
> the results page. Many mainstream OPACs do currently provide some form
> of hilighting on the detail/full-bib page, but it's not generally
> truly
> identifying _which_ parts of the record _actually_ matched your search
> (a search just on title will still hilight the word found in a non-
> title
> field), which I find annoying.
> Do these kind of hybrid approaches complicate the task of providing
> proper result hilighting in context, or am I off on the wrong
> direction?

Highlighting is tricky business all the way around.  XTF seems to be
the best solution I've seen for very detailed context highlighting.
But I suspect ILS systems don't need that level of complexity but
rather could leverage Solr's highlighting capabilities by ensuring
that the specific fields that need highlighting are stored.

Solr's highlighter (which is Lucene's contrib Highlighter under the
covers) does do field-specific highlighting, but it still is not
perfect.  For example, if you searched for title:"Blessed Damozel",
it would highlight "blessed" and "damozel" anywhere in the title
field even though the query is a phrase query where proximity matters.

For a proprietary contract job, I have written code that converts a
general Lucene Query into a SpanQuery and a highlighter that does
precise highlighting.  Beyond this code being proprietary, such that
I cannot share it, it is also not general purpose and does full field
highlighting, not scoring fragments like the Lucene highlighter
does.  The approach to converting to a SpanQuery is a good way to go
though, and has been discussed a bit in the Lucene e-mail list.

In short, I think the hybrid approach is still a good one, separating
the search engine from the actual data repository, but highlighting
requirements need to be considered up front.  Basic and decent field-
specific highlighting can be achieved with Solr, but its got caveats.