If you didn't optimize images before making the PDF, you can go into
preflight and do things like go to grayscale, go to black and white, etc.
And you can select which pages to do it on, so you knock down the
resolution and color depth for text, but leave them high for images.
OCR will store text on a separate layer, so if you flatten layers run OCR
later. Also, in Acrobat Pro 9, if you do OCR as a batch processing
function, then your staff time is nothing. You just leave the computer
churning away for days or weeks and it can do all PDFs in folders and
subfolders. So, start it on files before a long weekend, and come back to
the project when you back log is processed.
By the way, for long term archival purposes, you are better off loosing
some quality in the PDFs, as a trade off for manageable filesize. If you
compress, then uncompressing at a later date may be a problem.
Uncompressing in the future should be a primary factor in selecting a
compression program.
-Wilhelmina Randtke
On Oct 24, 2012 12:09 PM, "danielle plumer" <[log in to unmask]> wrote:
> As you probably know, you can compress PDFs by compressing or flattening
> the layers (most useful for born-digital materials, such as artwork) or by
> applying a compression algorithm to the underlying images for PDFs
> assembled from digitized images, which seems to be what you're doing.
> Reducing the image size (pixels) and bit depth prior to assembling images
> in a PDF (i.e., don't start with your 800ppi TIFF master) can have a
> dramatic difference on the total size of the PDF. Beyond that, lossless and
> lossy compression algorithms can reduce the size of the underlying image
> files, with different techniques working well on different types of images.
> IrfanView and Ghostscript can help with this. LZW is one of the more common
> lossless compression algorithms for TIFF images. JPEG2000 also offers good
> lossless compression.
>
> In addition to LuraTech, there's at least one other proprietary PDF
> compression system, developed by SAFER Inc. (http://www.saferinc.com/).
> Based
> on a conversation with someone from the company about 18 months ago, they
> use algorithms that do automatic edge detection and background detection,
> applying compression non-uniformly to regions that appear to contain little
> information. At the time of this conversation, they weren't able to give me
> any white papers or peer-reviewed articles describing the algorithms used,
> which made me hesitant about recommending the system for anything remotely
> archival, though they claimed it was lossless. For use copies, though, the
> software does work very well, and file size reduction is dramatic. I don't
> know anything about pricing. LuraTech may use something similar in their
> "Mixed Raster Content (MRC)" or "layered" compression. As far as I know,
> IrfanView and ghostscript don't include algorithms to do anything similar.
>
> Danielle Cunniff Plumer
> dcplumer associates
> www.dcplumer.com
>
>
>
> > > -----Original Message-----
> > > From: Code for Libraries [mailto:[log in to unmask]] On Behalf
> Of
> > > Nathan Tallman
> > > Sent: Wednesday, October 24, 2012 10:29 AM
> > > To: [log in to unmask]
> > > Subject: [CODE4LIB] PDF Compression
> > >
> > > Can anyone recommend some good PDF compression software? Preferable
> > > open-source or low-cost. We're scanning archival collections and the
> PDFs
> > > can be quite large for a single folder. The folder may be thick or
> thin,
> > > and contain a mix of text and images. We've fiddled with various
> Acrobat
> > > settings for getting the file size down, but we haven't found a good
> > > balance between quality and file size. (Plus, these need to be OCR'ed;
> so
> > > far we've been doing that in Acrobat.)
> > >
> > > We were looking at LuraTech PDF Compressor, but the cost for an
> > enterprise
> > > license is pretty high. It did do an excellent job though.
> > >
> > > Thanks,
> > > Nathan
> > >
> >
>
|