That worked pretty well. There is still come clean up I have to do
but [A-z]^p[A-z] to [A-z] [A-z] did a lot of the cleanup.
On Tue, Aug 4, 2015 at 12:17 PM, Kyle Banerjee <[log in to unmask]> wrote:
> On Tue, Aug 4, 2015 at 6:09 AM, Matt Sherman <[log in to unmask]>
>> I am on Windows machines, so I don't have quite the easy access to
>> that useful command. Someone had earlier put the OCR in a doc file so
>> I've been playing with that more than with the raw PDF OCR.
> Versions of the unix utilities that run on Windows are available, but you
> can just use Microsoft Word to do what you want. Just use the find/replace
> function. In Word, you can search for a paragraph marker by looking for
> "^p" (caret p)
> Because you undoubtedly have real paragraphs in the document which you
> don't want to remove, I'd recommend substituting double paragraph marks
> with something unique (e.g. "@ZZZ@") before replacing all the other
> paragraph marks with a space. Then replace your unique marker with a