LISTSERV 16.5 - CODE4LIB Archives

Thanks for the answers, they helped a lot!

> Field types: you'll probably want to index things like titles in two
> fields, one tokenized (text) and one not (string), so that you can
> retrieve and match the full title as well as searching for terms within
> it. See the way the Solr sample app uses "*_exact" fields. You can use
> the copyfield setting to avoid having to input the value twice.

What about searches that try to match exactly but only part of the
title, like:
title1: "Web Services with Perl"
title2: "Deploying Services on the Web"
search: title:"Web Services"
This seach should find title1 but not title2. Is "text" the correct type
or do I have to use "string" for this to work?
(Yes, I can test this myself but I have so much to learn and test that
it would help if someone can answer this off hand).
And: do you use "text" and perhaps "string" for all the other fields?

> The same considerations affect whether you want to use multi-valued
> fields: if you're going to facet on that field, you want distinct
> values, not a concatenated series; if you're only going to do free term
> searching, the concatenation might not be a problem (though you risk
> getting matches on phrase searches like "James Miller" against the
> example you gave below).
>
> If you use boost on the date field the way you suggest, remember you'll
> have to reindex from scratch every year to adjust the boost as items
> age. The sample solrconfig.xml contains an example of date-wrangling to
> get the same effect based on distance from the current date, rather than
> hard-coding the boost into the index.

Thanks for the hints, I will follow them!
Regarding boosting: this was just one example I found from requirements
in my institution, I am also looking for other ideas how to improve
search results.

> Assuming your data structures are the same and you're not talking
> millions of records, I'd be inclined to put everything in one index to
> make cross-searching easier, assuming you want cross-searching. If you
> don't, there's no reason not to have multiple indexes.

I don't need cross searching (yet) but I was afraid of starting too many
daemons or is it possible to have multiple indexes with one SOLR server
(like you can have multiple databases with one MySQL server)?
And yes, there will be millions of records, at least eventually.

> There is a way to pass Solr a path to a file that it can read from disk
> rather than posting the file. I hunted a bit in the wiki and couldn't
> find it, though; it may still be a patch you have to apply.

That would really help, so if someone can remember where to find such a
tool or command please post the info.
To Peter Kiraly: Learning at least some JAVA is on my todo list but at
the moment all I know are scripting languages: Perl best, some Python
and PHP (see my other post on this one).
But perhaps there is a JAVA tool around (the same Peter Binkley
mentions?) that can take a prepared XML (or other text file) and feed it
directly into the index?

Thanks again
- Michael