Print

Print


On Feb 10, 2015, at 11:46 AM, Erik Hatcher <[log in to unmask]> wrote:

> bin/post -c collection_name /path/to/file.doc

The almost trivial command to index a Word document in Solr, above, is most certainly appealing, but I’m wondering about the underlying index’s schema.

Tika makes every effort to extract as much metadata from Word documents as possible. This metadata includes dates, titles, authors, names of applications, last edit, etc. Some of this data can be very useful. The metadata can be packaged up as an XML file/stream and then sent to Solr for indexing. "Tastes great. Less filling.” But my question is, “To what degree does Solr know what to do with the metadata when the (kewl) command, above, is seemingly so generic? Does one need to create a Solr schema to specifically accommodate the Tika-created metadata, or do such things also come for ‘free’?”

— 
Eric Morgan