>> How do you fill the index? Our main database has about 700,000 records
>> and I don't know if I should build one huge XML-file and feed that into
>> SOLR or use a script that sends one record at a time with a commit after
>> every 1000 records or so. Or do something in between and split it into
>> chunks of a few thousand records each? What are your experiences? What
>> if a record gives an error? Will the whole file be recjected or just
>> that one record?
>There is a Java command line tool or you can see the VuFind's
>solution. If you can, I suggest you to prefer a pure java
>solution, writing directly to the Solr index (with the Solr
>API), because its much-much more quicker than the PHP's
>(Rail's, Perl's) solution which based on a web-service
>(which need the PHP parsing and HTTP request curve).
>The PHP solution does nothing with Solr directly, it
>use the web service, and all the code can be rewriten
>in Perl.
When you want use a scripting language to fill the solr index, rather
then using solr API directly, you should consider buffering as an
intermediate solution. It can speed up indexing with orders of
magnitude. Create your XML in the script, and keep them in-memory until
you have 50 or 100 documents. Then post these together.
Attached is a small ruby script we use to do solr indexing. It reads
yaml records from standard input, does some processing (burried in our
libraries), buffers the result and posts after 100 records are gathered.
Regards,
Ewout
|