> I created a schema.xml where basically every field is of type "text" for
> the beginning. Do you use specialized types for authors or ISBNs or
> other fields?
I use different fields for every MARC fields i want to search,
moreover there is a field: UDC notation which is split up
to atomic notation, so 1 complex udc will be 3+ Solr fields.
> How do you handle multi-value fields? Do you feed everything in a single
> field (like "Smith, James ; Miller, Steve" as I have seen in a pure
> Lucene implementation of a collegue or do you use the multiValued
> feature of SOLR?
I usually create different fields with the same name.
I do it in Lucene as well. There is no problem with
repeating fields (same name, different values of course).
> What about boosting? I thought of giving the current year a boost="3.0"
> and then 0.1 less for every year the title is older, down to 1.0 for a
> 21-year-old book. The idea is to have a sort that tends to promote
> recent titles but still respects other aspects. Does this sound
> reasonable or are there other ideas? I would be very interested in an
> actual boosting-scheme from where I could start.
That sound reasonable.
> We have a couple of databases that should eventually indexed. Do you
> build one huge database with an additional "database" field or is it
> better to have every database in its own SOLR instance?
Our projects usually builds one index from different
sources - but it depends on the nature of your project.
We built up an application to which we convert 110+
CD-ROMs (originally in Folio database) - this covers
2 200 000+ xhtml page, and there are some search forms
for the different DBs. It is a Lucene project, not Solr.
> How do you fill the index? Our main database has about 700,000 records
> and I don't know if I should build one huge XML-file and feed that into
> SOLR or use a script that sends one record at a time with a commit after
> every 1000 records or so. Or do something in between and split it into
> chunks of a few thousand records each? What are your experiences? What
> if a record gives an error? Will the whole file be recjected or just
> that one record?
There is a Java command line tool or you can see the VuFind's
solution. If you can, I suggest you to prefer a pure java
solution, writing directly to the Solr index (with the Solr
API), because its much-much more quicker than the PHP's
(Rail's, Perl's) solution which based on a web-service
(which need the PHP parsing and HTTP request curve).
The PHP solution does nothing with Solr directly, it
use the web service, and all the code can be rewriten
> Any other ideas, further reading, experiences...?
See the source files of the solutions based on Solr, there
are some, even in the library scene (PHP, Rail, Python).