Print

Print


On Aug 7, 2012, at 1:23 AM, Yong Tang <[log in to unmask]> wrote:

> First of all, what tool /tools do you use to manipulate PDF file 
> directly in a script? I tried some Perl modules such as CAM::PDF and 
> PDF::API2. The results were not pretty. The original text format was lost.

Yong, what type of manipulation do you want to do? What is your goal? Extract the plain text of a PDF document? Read the PDF document's metadata? Group the PDF documents into similar piles? While I haven't done any PDF document metadata reading, I'm sure there are Perl modules supporting these functions. Regarding the extraction of plain text, you have already gotten a number suggestions. Personally, I use a binary called pdftotext (a part of the venerable Xpdf -- http://www.foolabs.com/xpdf/download.html) and use Perl's system command execute it.

-- 
Eric Morgan