On Sat, Apr 27, 2013 at 9:37 PM, Andrew Hankinson < [log in to unmask]> wrote: > As someone who works on document recognition, I have to disagree. You > should always keep an uncompressed original around, since you can never > recover it without (often expensive) re-imaging. JPEG, or any other type of > lossy compression, introduces artifacts that don't look "too bad" by the > human eye, but have a significant effect on the quality of OCR. You can > never recover this after you have discarded your originals. > > Big files are clunky to work with, which is why you should have an > automated way of producing surrogate, compressed copies for general use, > but like any archivist will tell you, a photocopy is not a replacement for > the original. > All true, but keeping "just in case" copies of uncompressed files around has significant disadvantages unless you have the resources to deal with them. Any archivist will tell you they need the uncompressed files. However, many of them don't have the disk space, bandwidth, staff resources, etc to deal with these files and wind up doing things that are far more dangerous like just having files sitting around on cheap external HD's. Every choice people make is about loss. Equipment, optics, lighting, you name it. But for some reason, the instant we're talking about bits of data on a disk, people plan as though capacity were unlimited when most archives are severely underresourced. If you only have to deal with a few small projects, keeping uncompressed images is no big deal. But let's suppose you have a million pages or more -- this introduces a completely different cost structure that permanently affects what resources you'll have for other projects in the future. Objectives and available resources need to drive decisions unless we believe that the best plan is to do what we'd do in an ideal world until resources run out. kyle