+1 https://www.documentcloud.org/opensource -- Al Matthews Software Developer, Digital Services Unit Atlanta University Center, Robert W. Woodruff Library email: [log in to unmask]; office: 1 404 978 2057 On 10/15/13 4:23 PM, "Arash.Joorabchi" <[log in to unmask]> wrote: >Eric, > >You might want to consider using http://www.documentcloud.org to host >your users document. That would also take care of >privacy/authentication concerns. I know of a project in journalism >domain (http://overview.ap.org/) which does that. > >As far as I remember they do provide an API interface and do some named >entity recognition as well. > >Regards, >Arash > >-----Original Message----- >From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of >Eric Lease Morgan >Sent: 11 October 2013 18:58 >To: [log in to unmask] >Subject: Re: [CODE4LIB] pdf2txt > >On Oct 11, 2013, at 1:49 PM, Matthew Sherman <[log in to unmask]> >wrote: > >>> For a limited period of time I am making publicly available a >>> Web-based program called PDF2TXT -- http://bit.ly/1bJRyh8 >> >> Very slick, good work. I can see where this tool can be very helpful. > >> It does have some issues with some characters, but this is rather >> common with most systems. > >Again, thank you for the support. Yes, there are some escaping issues to >be resolved. "Release early. Release often." I need help with the >graphic design in general. > >Here's an enhancement I thought of: > > 1. allow readers to authenticate > 2. allow readers to upload documents > 3. documents get saved in readers' cache > 4. allow interface to list documents in the cache > 5. provide text mining services against reader-selected documents > 6. go to Step #1 > >It would also be cool if I could figure out how to finish the >installation of Tesseract to enable OCRing. [1] > >[1] OCRing - >http://serials.infomotions.com/code4lib/archive/2013/201303/1554.html > >-- >Eric Morgan > >----- >No virus found in this message. >Checked by AVG - www.avg.com >Version: 2014.0.4142 / Virus Database: 3604/6734 - Release Date: >10/08/13