LISTSERV 16.5 - CODE4LIB Archives

Here's an idea for web-based OCR:

  1. Have Web-based OCR available

  2. Make it easy for people to save
     content in a Web-accessible
     location thing like Box.net

  3. Allow readers (I don't use the
     word "users" anymore) to select
     items from their Web-accessible
     location and have them returned
     as OCR'ed texts

  4. Go a bit further and allow
     readers to do basic text mining
     on their corpus: word and phrase
     tabulations, word clouds,
     concordances, parts-of-speech
     analysis, named-entity
     extraction, etc.

Yea, I know. Much of of this has been done, but it has not been glued together.

--
ELM