Art makes a good case for Tesseract which is also integrated with ResourceSpace, an LAMP based open source DAM application. Resourcespace already has robust PDF processing via ghostscript and imagemagick. It also integrates with Collection Management solutions like EMU, TMS, and Collective Access. With the caveat of not knowing your workflow exactly it may be an option to consider. www.resourcespace.com/ On Thu, Jul 20, 2017 at 8:27 AM, Art Rhyno. <[log in to unmask]> wrote: > If you combine Tesseract with other open source tools like Imagemagick (to > prep images), Olena (to segment column-heavy media like newspapers), and > Hadoop (if you are working with thousands or millions of pages), it can do > a lot of heavy lifting. > With Regards, *Matthew Patulski* listening / thinking / doing +1 (616) 361-3951 / [log in to unmask] / linkedin.com/in/mrpatulski