As someone who works on document recognition, I have to disagree. You should always keep an uncompressed original around, since you can never recover it without (often expensive) re-imaging. JPEG, or any other type of lossy compression, introduces artifacts that don't look "too bad" by the human eye, but have a significant effect on the quality of OCR. You can never recover this after you have discarded your originals.

Big files are clunky to work with, which is why you should have an automated way of producing surrogate, compressed copies for general use, but like any archivist will tell you, a photocopy is not a replacement for the original.


On 2013-04-27, at 7:17 PM, Wilhelmina Randtke <[log in to unmask]> wrote:

> Yes, exactly.  You will loose some of the image quality.  If you change to
> a compressed format, then back to the TIFF, you can get the format, but you
> can't go back to the original file.
> Stop and think:  What are your long term goals?
> Big files are clunky to work with.  I'm guessing that's why you don't want
> TIFF.  In my experience, files big enough to be clunky are discarded within
> a few years, regardless of the intentions when they were prepped.  If you
> want to avoid big files, then your best bet is to assess and test the file
> you will actually keep and do the best job you can with it.  So, if you
> want to rerun OCR in a few years when the recognition will be better, then
> make your PDFs in such a way that you can get decent OCR out of them today,
> and plan to rerun on those files, not the (discarded) originals.  Don't
> think reformatting will get you any better image quality later.
> -Wilhelmina Randtke
> On Fri, Apr 26, 2013 at 3:19 PM, James Gilbert <[log in to unmask]>wrote:
>> I'm by no means an expert in the math behind image format conversions...
>> but:
>> When converting to TIFF-to-JPG, TIFF is uncompressed formatting and JPG is
>> compressed format.
>> When back converting, wouldn't the original quality of TIFF would be lost,
>> converted only to the quality of the last JPG (with degradation on each
>> time
>> this process occurs)?
>> James Gilbert, BS, MLIS
>> Systems Librarian
>> Whitehall Township Public Library
>> 3700 Mechanicsville Road
>> Whitehall, PA 18052
>> 610-432-4339 ext: 203
>> -----Original Message-----
>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>> Roy
>> Sent: Friday, April 26, 2013 4:15 PM
>> To: [log in to unmask]
>> Subject: Re: [CODE4LIB] tiff2pdf, then back to pdf?
>> If you can stand an extrastep, Ed, there are tools to convert PDF to jpg
>> images, and from there it shouldn't be too hard to get TIFF output. Do a
>> search for "convert PDF to image" to get started. There are tools that are
>> not online only, which I'm pretty sure is what you're after.
>> Roy Zimmer
>> Western Michigan University
>> On 4/26/2013 4:08 PM, Edward M. Corrado wrote:
>>> Hi All,
>>> I have a need to batch convert many TIFF images to PDF. I'd then like
>>> to be able to discard the TIFF images, but I can only do that if I can
>>> create the original TIFF again from the PDF. Is this possible? If so,
>>> using what tools and how?
>>> tiff2pdf seems like a possible solution, but I can't find a
>>> corresponding "pdf2tif" program that reverses the process.
>>> Any ideas?
>>> Edward