And for going beyond the bibliographic citations to include abstracts as well, https://grobid.readthedocs.io/en/latest/ might be useful. --Kevin On 5/12/22 1:49 PM, Julia Bauder wrote: > Hi, Danielle, > > Have you taken a look at https://text2bib.economics.utoronto.ca/ ? If it > works for you, that's likely to be one of the easiest methods to convert > the list into structured data. > > Best, > Julia > > _____________________________________________________ > Julia Bauder > Social Studies and Data Services Librarian > Director, Data Analysis and Social Inquiry Lab > Grinnell College Libraries > 1111 6th Ave. > Grinnell, IA 50112 > > On Thu, May 12, 2022 at 1:40 PM Danielle Reay <[log in to unmask]> wrote: > >> Hello, >> >> We have a faculty member looking to create a dataset from an annotated >> bibliography she compiled. Right now it exists as a word file and as a pdf. >> The entries are relatively structured with a citation and an abstract, but >> the document is about 150 pages long with multiple entries per page. Rather >> than manually copy and paste everything to create the spreadsheet/csv, I >> wanted to ask for suggestions or approaches to doing this by either >> scraping or extracting structured data from the pdf. Thanks very much in >> advance! >> >> Danielle Reay >> >> Digital Scholarship Technology Manager >> Drew University >>