On Fri, Jan 20, 2012 at 8:01 AM, Farrell, Larry D
<[log in to unmask]> wrote:
> At this point I was primarily targeting PDF and Microsoft Office files that would be passed on to our cataloging folks for manual inspection if they were DRM protected. As has been pointed out on the list, general DRM detection has far trickier than I'd initially thought. I've been using Apache Tika for file type detection, metadata and full text extraction. However, when parsing encrypted or password protected files it throws the less than unhelpful "Unexpected Runtime Exception".
If you're looking for a marker of "PDFs that need manual inspection,"
then "causes Tika to throw a runtime exception" might be a pretty good
choice.
-n
|