Print

Print


Hello,

I am just getting my feet wet with SOLR and have a couple of question
how others have done certain things.

I created a schema.xml where basically every field is of type "text" for
the beginning. Do you use specialized types for authors or ISBNs or
other fields?
How do you handle multi-value fields? Do you feed everything in a single
field (like "Smith, James ; Miller, Steve" as I have seen in a pure
Lucene implementation of a collegue or do you use the multiValued
feature of SOLR?

What about boosting? I thought of giving the current year a boost="3.0"
and then 0.1 less for every year the title is older, down to 1.0 for a
21-year-old book. The idea is to have a sort that tends to promote
recent titles but still respects other aspects. Does this sound
reasonable or are there other ideas? I would be very interested in an
actual boosting-scheme from where I could start.

We have a couple of databases that should eventually indexed. Do you
build one huge database with an additional "database" field or is it
better to have every database in its own SOLR instance?

How do you fill the index? Our main database has about 700,000 records
and I don't know if I should build one huge XML-file and feed that into
SOLR or use a script that sends one record at a time with a commit after
every 1000 records or so. Or do something in between and split it into
chunks of a few thousand records each? What are your experiences? What
if a record gives an error? Will the whole file be recjected or just
that one record?
Are there alternatives to the HTTP gateway?
Are there any Perl-scripts around that could help? I built a little
script that uses LWP to feed my test records into the database. It works
but I don't have any error handling yet, very Quick and dirty XML
creation so if there is something more mature I would like to use that.

Any other ideas, further reading, experiences...?

I know these are a lot of questions but after the conference last year I
think there is lots of expertise in this group and perhaps I can avoid a
few beginner mistakes with your help

thanks in advance
- Michael