LISTSERV 16.5 - CODE4LIB Archives

i think some of the new TermVectorComponent stuff might be
applicable...i've not experimented with it yet tho, so YMMV.

     http://wiki.apache.org/solr/TermVectorComponent

it's only part of 1.4, which is due for a release any day now, once
they patch up a Lucene bug


On Fri, Oct 16, 2009 at 3:52 PM, Eric James <[log in to unmask]> wrote:
> Thanks for your response.  But, yes I'm able to use facets in general, and yes I'm able to do highlighting on stored fields.
>
>
>
> But finding how many times the query appears in the full text is my question. For example say you search on "Heisenberg"   We'd like to see:
>
>
>
> Hit 1: Your search for Heisenberg appears 10 times within the Finding Aid
>
> Hit 2: Your search for Heisenberg appears 3 times within the Finding Aid
>
> Hit 3: Your search for Heisenberg appears 88 times within the Finding Aid
>
> etc
>
>
>
> Could there be a solr parameter that calculates this? Otherwise a klugey, not very scalable method could be that once you retrieve a solr result xml, find the fedora pid, retrieve the EAD full text, run a standard function to count how many times the query appears in the text for each hit, and add parameters back into the xml with these counts.
>
>
>
>
>> Date: Fri, 16 Oct 2009 15:27:42 -0400
>> From: [log in to unmask]
>> Subject: Re: [CODE4LIB] solr - search query count | highlighting
>> To: [log in to unmask]
>>
>> Hi Eric,
>>
>> You do not have to store the entire text content of the EAD guide in order
>> to enable facets. Here's an example:
>> http://kittredgecollection.org/results?q=*:* . There are about 15 facets
>> enabled on a collection of almost 1500 EAD documents (though quite small in
>> filesize compared to traditional EAD finding aids), and there's no slowdown
>> whatsoever. I don't believe you need to store the guides to enable
>> highlighting either, though I have heard there is some dropoff in
>> performance with highlighting enabled. I've never done benchmarking on
>> highlighting enabled versus disabled, so I can't tell you how much of a
>> dropoff there is. In an index of only several hundred documents, I would
>> think that the dropoff with highlighting enabled would be fairly negligible.
>>
>> Ethan
>>
>> On Fri, Oct 16, 2009 at 3:12 PM, Eric James <[log in to unmask]> wrote:
>>
>> > For our finding aids, we are using fedoragenericsearch 2.2 with solr as
>> > index. Because the EADs can be huge, the EADs are indexed but not stored
>> > (with stored EADs, search time for ~500 objects = 20 min rather than < 1
>> > sec).
>> >
>> >
>> >
>> > However, we would like to have number of search terms found within each
>> > hit. For example, CDL's collection:
>> >
>> > http://www.oac.cdlib.org/search?query=Donner
>> >
>> >
>> >
>> > Also we would like highlighting/snippets of the search term similar to
>> > CDL's.
>> >
>> >
>> >
>> > Is it a lost cause to have this functionality without storing the EAD? Is
>> > there a way to store the EAD and have a reasonable response time?
>> >
>> >
>> >
>> > ---
>> >
>> > Eric James
>> >
>> > Yale University Libraries
>> >
>> >
>> >
>> >
>> >
>