Art makes a good case for Tesseract which is also integrated with
ResourceSpace, an LAMP based open source DAM application. Resourcespace
already has robust PDF processing via ghostscript and imagemagick. It also
integrates with Collection Management solutions like EMU, TMS, and
Collective Access. With the caveat of not knowing your workflow exactly it
may be an option to consider.
www.resourcespace.com/
On Thu, Jul 20, 2017 at 8:27 AM, Art Rhyno. <[log in to unmask]> wrote:
> If you combine Tesseract with other open source tools like Imagemagick (to
> prep images), Olena (to segment column-heavy media like newspapers), and
> Hadoop (if you are working with thousands or millions of pages), it can do
> a lot of heavy lifting.
>
With Regards,
*Matthew Patulski*
listening / thinking / doing
+1 (616) 361-3951 / [log in to unmask] / linkedin.com/in/mrpatulski
|