Eric, > 5. Use pdttotext to extract the OCRed text > from the PDF and index it along with > the MyLibrary metadata using Solr. [3, 4] > Have you considered using Solr's ExtractingRequestHandler [1] for the PDFs? We're using it at NYPL with pretty great success. [1] http://wiki.apache.org/solr/ExtractingRequestHandler Mark A. Matienzo Applications Developer, Digital Experience Group The New York Public Library