Print

Print


I found this book helped me get my head around Solr: 
https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-beginner%E2%80%99s-guide.

Chapter 8 explains indexing rich text formats including MS Word.

Chris Gray
Systems Analyst
519-888-4567, ext. 35764
[log in to unmask]
University of Waterloo

On 15-02-10 11:12 AM, Eric Lease Morgan wrote:
> Can somebody point me to a good tutorial on how to index Word documents using Solr?
>
> I have a few hundred Microsoft Word documents I want to search. Through the use of the Tika library it seems as if I ought to be able to index my Word documents directly into Solr, but none of the tutorials I have found on the Web are complete. Missing directories. Missing files. Documentation for versions unreleased. Etc.
>
> Put another way, Tika can create a (nice) XHTML file complete with some useful metadata that can all be fed to Solr for indexing, but I can barely get out of the starting gate. Have you indexed Word documents using Solr, and if so, then how?
>
> —
> Eric Morgan