Casey Durfee wrote:
>I am writing a Solr-powered OPAC right now and have not had any performance problems (either indexing or querying) using Solr for both data storage and search. You can indicate in Solr whether you want particular data fields to be stored, indexed or both. So I stick the entire MARC record in a Solr field but don't index on it.
>
>
This is good to know. What I did was strip down my marcxml records to
only include fields that are needed for searching. I also formatted the
marcxml fields so that all subfields were in one main field. For
example I have an element in my xml document called T245 which has the a
and b subfields but not the c subfield, etc. This way my indexes are
much more compact and the database is as well which made my native xml
database implementation from completely worthless to usable. But I am
still not totally happy with the performance.
>
>Just using Solr has proven to be much faster than doing the search in Solr and then retrieving full data from another database. This also has the advantage of making it so there's only one thing you gotta keep in sync with the ILS. The only data that my OPAC needs to talk to a SQL database for is item-level information, which changes too often to keep synced.
>
My only concern about lucene is the lack of a standard query language.
I went down the native XML database path because of XQuery and XSL, does
something like lucene and solr offer a strong query language? Is it a
standard? What if someone developed a kick ass text indexer in 2 years
that totally blows lucene out of the water, would you easily be able to
switch systems?
Andrew
|