Hi,
I am a full time information science student and a part time LAMP server
administrator. I was recently thrown into a file dumpster containing
hundreds of old PDF files. My job is to clearn the dumpster up by
putting right files into right folders. I am facing some difficulties
when writing a Perl script to get the job done. I would appreciate it if
you could help.
First of all, what tool /tools do you use to manipulate PDF file
directly in a script? I tried some Perl modules such as CAM::PDF and
PDF::API2. The results were not pretty. The original text format was lost.
I am regret that I did not take a XML class last semester, for I just
get an intuition that the best way to do this job is to save the PDFs
into XMLs, and then work on the XMLs with script. Instead, I have to
save the PDFs into plain texts. I found PDFedit and Adobe Acrobat X Pro
were good because both of them kept original text format after the
conversion. However, I have no idea how to use them to save multiple
PDFs into plain texts at once. I googled for the answers but found no
luck. Anybody knows how to do it?
I am new to text processing. Maybe I am heading in a wrong direction for
this project? Any input is appreciated.
Yong Tang
A student
|