Do you need OCR?
This script =>
http://bookscanner.pbworks.com/w/page/45609343/Homer%20bash%20script
will OCR a directory of TIFFs (using Tesseract) and build a PDF using
Tesseract.
It's a little old, but I still use it pretty much every day. I think you'll
need to have Ruby 1.9 installed, since the PDFBeads library uses Hpricot.
There's lots of Document View/Book Widget/Page Turners...the Internet
Archive one is good. I also really like the NYTime Document Viewer (
https://github.com/documentcloud/document-viewer ). The DocumentCloud
people also have something to rip your PDFs apart and put them into the
viewer ( https://github.com/documentcloud/docsplit )
On Fri, Nov 8, 2013 at 8:23 PM, Karen Coyle <[log in to unmask]> wrote:
> +1 for the viewer concept, and I'll add that viewing & downloading meet
> different needs and should both be offered if possible. (said because of
> recently having had to download huge PDFs just to glance at a few pages).
>
> kc
>
>
> On 11/8/13 11:10 AM, Edward Summers wrote:
>
>> It is sad to me that converting to PDF for viewing off the Web seems like
>> the answer. Isn’t there a tiling viewer (like Leaflet) that could be used
>> to render jpeg derivatives of the original tif files in Omeka?
>>
>> For an example of using Leaflet (usually used for working with maps) in
>> this way checkout NYTimes Machine Beta:
>>
>> http://apps.beta620.nytimes.com/timesmachine/1969/07/20/issue.html
>>
>> //Ed
>>
>> On Nov 8, 2013, at 2:00 PM, Kyle Banerjee <[log in to unmask]>
>> wrote:
>>
>> We are in the process of migrating our digital collections from CONTENTdm
>>> to Omeka and are trying to figure out what to do about the compound
>>> objects
>>> -- the vast majority of which are digitized books.
>>>
>>> The source files are actually hi res tiffs but since ginormous objects
>>> broken into hundreds of pieces (each of which can be well over 100MB in
>>> size) aren't exactly friendly to use, we'd like to stitch them into
>>> individual pdf's that can be viewed more conveniently
>>>
>>> My game plan is to simply have a script pull the files down as jpegs
>>> which
>>> can be fed to imagemagick which can theoretically do everything I need.
>>> However, I've never actually done anything like this before, so I wanted
>>> to
>>> see if there's a method that people have used for combining lots of
>>> images
>>> into pdfs that works particularly well. Thanks,
>>>
>>> kyle
>>>
>>
> --
> Karen Coyle
> [log in to unmask] http://kcoyle.net
> m: 1-510-435-8234
> skype: kcoylenet
>
|