And for going beyond the bibliographic citations to include abstracts as
well, https://grobid.readthedocs.io/en/latest/ might be useful. --Kevin
On 5/12/22 1:49 PM, Julia Bauder wrote:
> Hi, Danielle,
> Have you taken a look at https://text2bib.economics.utoronto.ca/ ? If it
> works for you, that's likely to be one of the easiest methods to convert
> the list into structured data.
> Julia Bauder
> Social Studies and Data Services Librarian
> Director, Data Analysis and Social Inquiry Lab
> Grinnell College Libraries
> 1111 6th Ave.
> Grinnell, IA 50112
> On Thu, May 12, 2022 at 1:40 PM Danielle Reay <[log in to unmask]> wrote:
>> We have a faculty member looking to create a dataset from an annotated
>> bibliography she compiled. Right now it exists as a word file and as a pdf.
>> The entries are relatively structured with a citation and an abstract, but
>> the document is about 150 pages long with multiple entries per page. Rather
>> than manually copy and paste everything to create the spreadsheet/csv, I
>> wanted to ask for suggestions or approaches to doing this by either
>> scraping or extracting structured data from the pdf. Thanks very much in
>> Danielle Reay
>> Digital Scholarship Technology Manager
>> Drew University