> For a limited period of time I am making publicly available a Web-based program called PDF2TXT --

Looks very good, and thanks for sharing it. (It's certainly not the
first piece of software called pdf2txt, but that probably doesn't

> PDF2TXT extracts the text from an OCRed PDF document

The file I tried was digital native (probably from Word) so perhaps
outside your intended scope. The text output was fairly similar to
that from pdftotext (in Ubuntu poppler-utils package), perhaps better
in losing the arbitrary line breaks, but fell over on macrons. There
were a lot of Māori words and the vowels with macrons disappeared -
e.g. Pākehā => Pkeh.

I assume Unicode issues were also at the heart of %3Cunknown%3E being
one of the "most frequent verbs".  The link for this [1] gives a regex