I wonder how the "field collapsing" patch holds up on an index that contains 3 million documents, probably larger than your EAD-only one, but thinking about combining EAD in an index with many many other documents (like with a library catalog). Might be fine, might not.
(Even without field collapsing, my solr index is really straining against the numerous facets I'm making it calculate and the dismax queries involving a dozen or more fields -- I plan to reduce my fields, reduce my facets if possible, and most importantly give my Solr a LOT more RAM than it has now. Complex queries with complex facetting on a several-million-doc index requires giving Solr a LOT more RAM for caches etc than we initially expected, I throw this in as a note to anyone else in the planning stages).
I've been brainstorming other weird ways to do this. This one is totally wacky and possibly a bad idea, but I'll throw it out there anyway. What if you only indexed the entire EAD as one document, BUT threw the entire EAD in a stored field, and used solr highlightning on that field. NOT to show the highlighter results to the user, but to sort of trick the highlighter, using hl.fragmenter/fragmentsBuilder (possibly with a custom component in a jar) to telling you _which_ sub-sections of the EAD matched, and your software could then display the matching sub-sections (possibly with direct links to display) in the search results, under the actual document hit.
Possibly a really screwy idea, just throwing it out there. Solr highlightning can be a performance problem on very large stored documents too, not sure if typical EAD is 'very large' for these purposes, or if it's something that can be solved by throwing enough RAM at caches. But I guess something about the field collapsing patch makes me nervous, comments about it's performance being uncertain on very large result sets, or just nervousness about applying a patch to solr and counting on someone else to keep it working against solr master as it develops.
Jonathan
|