Your question reminded me that I wanted to do something similar for a
pile of PDFs from our institutional repository. So I made a small script
in Python that can do this, using the Python module from:
http://pybrary.net/pyPdf/
You can then put the following in a script: (mind the indentation)
---8<------
import os, sys
import pyPdf
if len(sys.argv) > 1:
PATH = sys.argv[1]
else:
PATH = '.'
for dirpath, dirnames, filenames in os.walk(PATH):
for filename in filenames:
try:
filename_path = os.path.join(dirpath, filename)
checked_file = pyPdf.PdfFileReader(file(filename_path, "rb"))
except Exception, e:
sys.stderr.write('%s :: %s\n' % (filename_path, e))
---8<---------
If you run it without arguments, it checks the current directory, if you
specify a path it will walk down the tree and check each file found.
If a file is not recognised as a PDF it barfs on stderr.
Have fun.
Etienne Posthumus
TU Delft Library - Digital Product Development
t: +31 (0) 15 27 81 949
m: [log in to unmask]
skype: eposthumus
http://www.library.tudelft.nl/
Prometheusplein 1, 2628 ZC, Delft, Netherlands
|