Sorry for the short answer. Maybe pdftk http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/ is good for you. sb On 07/ago/2012, at 07.23, Yong Tang wrote: > Hi, > > I am a full time information science student and a part time LAMP server administrator. I was recently thrown into a file dumpster containing hundreds of old PDF files. My job is to clearn the dumpster up by putting right files into right folders. I am facing some difficulties when writing a Perl script to get the job done. I would appreciate it if you could help. > > First of all, what tool /tools do you use to manipulate PDF file directly in a script? I tried some Perl modules such as CAM::PDF and PDF::API2. The results were not pretty. The original text format was lost. > > I am regret that I did not take a XML class last semester, for I just get an intuition that the best way to do this job is to save the PDFs into XMLs, and then work on the XMLs with script. Instead, I have to save the PDFs into plain texts. I found PDFedit and Adobe Acrobat X Pro were good because both of them kept original text format after the conversion. However, I have no idea how to use them to save multiple PDFs into plain texts at once. I googled for the answers but found no luck. Anybody knows how to do it? > > I am new to text processing. Maybe I am heading in a wrong direction for this project? Any input is appreciated. > > Yong Tang > A student > __________________________________________________ Il tuo 5x1000 al Patronato di San Girolamo della Carita' e' un gesto semplice ma di grande valore. Una tua firma aiutera' i sacerdoti ad essere piu' vicini alle esigenze di tutti noi. Aiutaci a formare sacerdoti e seminaristi provenienti dai 5 continenti indicando nella dichiarazione dei redditi il codice fiscale 97023980580.