Print

Print


On 23.11.2007 10:39 Erik Hatcher wrote:

> As far as I know there is no way to avoid POSTing in the XML - no
> direct import of an XML file without HTTP (without getting down and
> dirty and writing to the embedded Solr API, which is a bit
> discouraged for many reasons).

Ok.

> You can also bring data into Solr using the CSV importer.  I highly
> recommend folks take a good look at this route.  It's clean, easy, fast:
> <http://wiki.apache.org/solr/UpdateCSV>

That sounds like what I need. Only problem I see: what about escapes? I
don't know my data good enough to be sure that any possible delimiter
will never occur within the data. Most exotic characters will probably
be errors but I still don't want SOLR to choke on it.
Can I use escapes for separator and/or encapsulator? If so is it \" or
"" (backslash or doubling)? I found nothing in the docs about it.

> For 700,000 records, one first nice step to try is to convert that
> data into CSV and feed it into Solr.  Create a CSV file on the file
> system with all those records and use the CSV importer.  I think
> you'll find that the absolute fastest way to bring data in.   But

It even looks like the direct way (almost) without HTTP since the file
is read directly from the file system and doesn't have to be squeezed
through a socket connection.

To Peter: Thanks for the books. I think I will have something to do for
some time now ;-)
To Ewout: Thanks for the script. I will have a look at it but I think I
will try the csv route first.

Thanks for the help
-Michael