LISTSERV 16.5 - CODE4LIB Archives

Hi, Michael,

> I created a schema.xml where basically every field is of type "text" for
> the beginning. Do you use specialized types for authors or ISBNs or
> other fields?
I use different fields for every MARC fields i want to search,
moreover there is a field: UDC notation which is split up
to atomic notation, so 1 complex udc will be 3+ Solr fields.

> How do you handle multi-value fields? Do you feed everything in a single
> field (like "Smith, James ; Miller, Steve" as I have seen in a pure
> Lucene implementation of a collegue or do you use the multiValued
> feature of SOLR?
I usually create different fields with the same name.
I do it in Lucene as well. There is no problem with
repeating fields (same name, different values of course).

> What about boosting? I thought of giving the current year a boost="3.0"
> and then 0.1 less for every year the title is older, down to 1.0 for a
> 21-year-old book. The idea is to have a sort that tends to promote
> recent titles but still respects other aspects. Does this sound
> reasonable or are there other ideas? I would be very interested in an
> actual boosting-scheme from where I could start.
That sound reasonable.

> We have a couple of databases that should eventually indexed. Do you
> build one huge database with an additional "database" field or is it
> better to have every database in its own SOLR instance?
Our projects usually builds one index from different
sources - but it depends on the nature of your project.
We built up an application to which we convert 110+
CD-ROMs (originally in Folio database) - this covers
2 200 000+ xhtml page, and there are some search forms
for the different DBs. It is a Lucene project, not Solr.

> How do you fill the index? Our main database has about 700,000 records
> and I don't know if I should build one huge XML-file and feed that into
> SOLR or use a script that sends one record at a time with a commit after
> every 1000 records or so. Or do something in between and split it into
> chunks of a few thousand records each? What are your experiences? What
> if a record gives an error? Will the whole file be recjected or just
> that one record?
There is a Java command line tool or you can see the VuFind's
solution. If you can, I suggest you to prefer a pure java
solution, writing directly to the Solr index (with the Solr
API), because its much-much more quicker than the PHP's
(Rail's, Perl's) solution which based on a web-service
(which need the PHP parsing and HTTP request curve).
The PHP solution does nothing with Solr directly, it
use the web service, and all the code can be rewriten
in Perl.

> Any other ideas, further reading, experiences...?
See the source files of the solutions based on Solr, there
are some, even in the library scene (PHP, Rail, Python).
More info:
http://del.icio.us/popular/solr


Peter Kiraly
http://www.tesuji.eu