Sorry for the short answer. Maybe pdftk
is good for you.
On 07/ago/2012, at 07.23, Yong Tang wrote:
> I am a full time information science student and a part time LAMP server administrator. I was recently thrown into a file dumpster containing hundreds of old PDF files. My job is to clearn the dumpster up by putting right files into right folders. I am facing some difficulties when writing a Perl script to get the job done. I would appreciate it if you could help.
> First of all, what tool /tools do you use to manipulate PDF file directly in a script? I tried some Perl modules such as CAM::PDF and PDF::API2. The results were not pretty. The original text format was lost.
> I am regret that I did not take a XML class last semester, for I just get an intuition that the best way to do this job is to save the PDFs into XMLs, and then work on the XMLs with script. Instead, I have to save the PDFs into plain texts. I found PDFedit and Adobe Acrobat X Pro were good because both of them kept original text format after the conversion. However, I have no idea how to use them to save multiple PDFs into plain texts at once. I googled for the answers but found no luck. Anybody knows how to do it?
> I am new to text processing. Maybe I am heading in a wrong direction for this project? Any input is appreciated.
> Yong Tang
> A student
Il tuo 5x1000 al Patronato di San Girolamo della Carita' e' un gesto semplice ma di grande valore.
Una tua firma aiutera' i sacerdoti ad essere piu' vicini alle esigenze di tutti noi.
Aiutaci a formare sacerdoti e seminaristi provenienti dai 5 continenti indicando nella dichiarazione dei redditi il codice fiscale 97023980580.