Can somebody point me to a good tutorial on how to index Word documents using Solr?
I have a few hundred Microsoft Word documents I want to search. Through the use of the Tika library it seems as if I ought to be able to index my Word documents directly into Solr, but none of the tutorials I have found on the Web are complete. Missing directories. Missing files. Documentation for versions unreleased. Etc.
Put another way, Tika can create a (nice) XHTML file complete with some useful metadata that can all be fed to Solr for indexing, but I can barely get out of the starting gate. Have you indexed Word documents using Solr, and if so, then how?