> 5. Use pdttotext to extract the OCRed text
> from the PDF and index it along with
> the MyLibrary metadata using Solr. [3, 4]
Have you considered using Solr's ExtractingRequestHandler  for the
PDFs? We're using it at NYPL with pretty great success.
Mark A. Matienzo
Applications Developer, Digital Experience Group
The New York Public Library