LISTSERV 16.5 - CODE4LIB Archives

Danielle Reay wrote:
> Hello,
>
> We have a faculty member looking to create a dataset from an annotated
> bibliography she compiled. Right now it exists as a word file and as a pdf.
> The entries are relatively structured with a citation and an abstract, but
> the document is about 150 pages long with multiple entries per page. Rather
> than manually copy and paste everything to create the spreadsheet/csv, I
> wanted to ask for suggestions or approaches to doing this by either
> scraping or extracting structured data from the pdf. Thanks very much in
> advance!
>
>
I'd sure like to find a tool for this as well.  Though, in my case, the 
purpose would be to extract numbered requirements from RFPs.

There seems to be a distinct dearth of text analysis tools that can 
actually do structural analysis, based on numbering.

Miles Fidelman

-- 
In theory, there is no difference between theory and practice.
In practice, there is.  .... Yogi Berra

Theory is when you know everything but nothing works.
Practice is when everything works but no one knows why.
In our lab, theory and practice are combined:
nothing works and no one knows why.  ... unknown