Print

Print


Hmm - you're scaring me. I was thinking of the explain function, which
in Luke gives you access to the term frequency: so for a query
"title:alberta OR body:alberta", you can establish for a given hit the
term frequency in those two fields. I see in the api, however:

"This is intended to be used in developing Similarity implementations,
and, for good performance, should not be displayed with every hit.
Computing an explanation is as expensive as executing the query over the
entire index."

http://lucene.apache.org/java/docs/api/org/apache/lucene/search/IndexSea
rcher.html#explain(org.apache.lucene.search.Weight,%20int)

So maybe that's not such a great idea. But to meet Jonathan's
requirement, you could at least only run your term-highlighting function
against the fields included in the query.

Peter

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
Erik Hatcher
Sent: Monday, November 27, 2006 5:37 PM
To: [log in to unmask]
Subject: Re: [CODE4LIB] code4lib lucene pre-conference

On Nov 27, 2006, at 5:46 PM, Binkley, Peter wrote:
> You've got enough flexibility in the way you set up your Lucene index,

> and Lucene search results give you access to the term weights for each

> hit,

It does?

> so you can tell which fields actually
> matched.

You can?

I'm curious how you're doing that!  Especially with Solr in the picture.

> There would probably be a lot of optimizations you could do within
> Solr to help with this kind of thing. Art and I talked a little about
> this at the ILS symposium: why not nestle the XML db inside Solr
> alongside Lucene? Solr could then manage the indexing of the contents
> of the db, and augment your search results with data from the db: you
> could get full records as part of your search results without having
> to store them in the Lucene index.

There has been discussions in the Solr community about having hooks
added to allow Solr plugins to pull data from external sources to return
with search results.  I don't think Solr itself is the entry point to
these external systems, as that seems to couple things a bit too much
for my tastes, so I think you'd still want to manage the external data
source separately from indexing into Solr, but having hooks for Solr to
return hybrid results could be just the ticket here.

        Erik