Print

Print


I have a recently released a bookclub - related app called Bookship, which features the ability to scan a page of text from a book so users can post quotes. (www.bookshipapp.com). So my use case is people taking pictures of pages with their phone and OCR-ing it.

I extensively tested Tesseract (an open source project at this point, not a formal Google product I don't think), and compared it Google Cloud Vision API's OCR product (https://cloud.google.com/vision/). For my use case, Google Cloud API blew away Tesseract. Tesseract really struggled with images that weren't perfectly vertical/horizontal and had difficulty dealing with the top and bottom of images (i.e. if a line got cut in half by the picture, Tesseract produced a few lines of gibberish at the top. The Google Cloud API seems to be nearly flawless at all of that. And was an order of magnitude faster. And also provides additional features (entity extraction, objectionable content, etc).

Of course, Tesseract is free and the Google product requires licensing - although provides a limited (1000/month I think) for free.

And of course these results may be due to my use case or my incorrect setup somehow..

Your Mileage May Vary :)

Mark