Print

Print


On Wed, Dec 14, 2011 at 02:19:43PM -0600, Jon Gorman wrote:
> pdftotext -> some cut & paste / sed / regex -> open in excel?
> 
> You might need to fiddle with the pdftotext settings, but I've been
> pretty successful with that before doing something else.

This is how I use pdftotext for this purpose:
pdftotext -nopgbrk -layout input.pdf output.txt

For those who wonder what this is: pdftotext is a command-line tool
from the poppler-utils package (this is how it is called in Debian and
Ubuntu Linux; see http://poppler.freedesktop.org for source code).
The Windows version is here: http://www.foolabs.com/xpdf/download.html

The resulting file, here called output.txt, contains plain text with
the formatting approximately left intact. Now you can (manually or
otherwise) save the tables from this file into files with .csv, .tsv
or .dat endings, and with any luck, R's read.table() function and
other statistics software as well as most spreadsheet software
will be able to open this file and make sense of it. Otherwise, you
will need to do some postprocessing/postediting.

Cheers,
Christian

-- 
  Christian Pietsch <http://purl.org/net/pietsch>