i think some of the new TermVectorComponent stuff might be applicable...i've not experimented with it yet tho, so YMMV. http://wiki.apache.org/solr/TermVectorComponent it's only part of 1.4, which is due for a release any day now, once they patch up a Lucene bug On Fri, Oct 16, 2009 at 3:52 PM, Eric James <[log in to unmask]> wrote: > Thanks for your response. But, yes I'm able to use facets in general, and yes I'm able to do highlighting on stored fields. > > > > But finding how many times the query appears in the full text is my question. For example say you search on "Heisenberg" We'd like to see: > > > > Hit 1: Your search for Heisenberg appears 10 times within the Finding Aid > > Hit 2: Your search for Heisenberg appears 3 times within the Finding Aid > > Hit 3: Your search for Heisenberg appears 88 times within the Finding Aid > > etc > > > > Could there be a solr parameter that calculates this? Otherwise a klugey, not very scalable method could be that once you retrieve a solr result xml, find the fedora pid, retrieve the EAD full text, run a standard function to count how many times the query appears in the text for each hit, and add parameters back into the xml with these counts. > > > > >> Date: Fri, 16 Oct 2009 15:27:42 -0400 >> From: [log in to unmask] >> Subject: Re: [CODE4LIB] solr - search query count | highlighting >> To: [log in to unmask] >> >> Hi Eric, >> >> You do not have to store the entire text content of the EAD guide in order >> to enable facets. Here's an example: >> http://kittredgecollection.org/results?q=*:* . There are about 15 facets >> enabled on a collection of almost 1500 EAD documents (though quite small in >> filesize compared to traditional EAD finding aids), and there's no slowdown >> whatsoever. I don't believe you need to store the guides to enable >> highlighting either, though I have heard there is some dropoff in >> performance with highlighting enabled. I've never done benchmarking on >> highlighting enabled versus disabled, so I can't tell you how much of a >> dropoff there is. In an index of only several hundred documents, I would >> think that the dropoff with highlighting enabled would be fairly negligible. >> >> Ethan >> >> On Fri, Oct 16, 2009 at 3:12 PM, Eric James <[log in to unmask]> wrote: >> >> > For our finding aids, we are using fedoragenericsearch 2.2 with solr as >> > index. Because the EADs can be huge, the EADs are indexed but not stored >> > (with stored EADs, search time for ~500 objects = 20 min rather than < 1 >> > sec). >> > >> > >> > >> > However, we would like to have number of search terms found within each >> > hit. For example, CDL's collection: >> > >> > http://www.oac.cdlib.org/search?query=Donner >> > >> > >> > >> > Also we would like highlighting/snippets of the search term similar to >> > CDL's. >> > >> > >> > >> > Is it a lost cause to have this functionality without storing the EAD? Is >> > there a way to store the EAD and have a reasonable response time? >> > >> > >> > >> > --- >> > >> > Eric James >> > >> > Yale University Libraries >> > >> > >> > >> > >> > >