Print

Print


+1

https://www.documentcloud.org/opensource

--
Al Matthews

Software Developer, Digital Services Unit
Atlanta University Center, Robert W. Woodruff Library
email: [log in to unmask]; office: 1 404 978 2057





On 10/15/13 4:23 PM, "Arash.Joorabchi" <[log in to unmask]> wrote:

>Eric,
>
>You might want to consider using http://www.documentcloud.org to host
>your users document. That would also take care of
>privacy/authentication concerns. I know of a project in journalism
>domain (http://overview.ap.org/) which does that.
>
>As far as I remember they do provide an API interface and do some named
>entity recognition as well.
>
>Regards,
>Arash
>
>-----Original Message-----
>From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>Eric Lease Morgan
>Sent: 11 October 2013 18:58
>To: [log in to unmask]
>Subject: Re: [CODE4LIB] pdf2txt
>
>On Oct 11, 2013, at 1:49 PM, Matthew Sherman <[log in to unmask]>
>wrote:
>
>>> For a limited period of time I am making publicly available a
>>> Web-based program called PDF2TXT -- http://bit.ly/1bJRyh8
>>
>> Very slick, good work.  I can see where this tool can be very helpful.
>
>> It does have some issues with some characters, but this is rather
>> common with most systems.
>
>Again, thank you for the support. Yes, there are some escaping issues to
>be resolved. "Release early. Release often." I need help with the
>graphic design in general.
>
>Here's an enhancement I thought of:
>
>  1. allow readers to authenticate
>  2. allow readers to upload documents
>  3. documents get saved in readers' cache
>  4. allow interface to list documents in the cache
>  5. provide text mining services against reader-selected documents
>  6. go to Step #1
>
>It would also be cool if I could figure out how to finish the
>installation of Tesseract to enable OCRing. [1]
>
>[1] OCRing -
>http://serials.infomotions.com/code4lib/archive/2013/201303/1554.html
>
>--
>Eric Morgan
>
>-----
>No virus found in this message.
>Checked by AVG - www.avg.com
>Version: 2014.0.4142 / Virus Database: 3604/6734 - Release Date:
>10/08/13