Hi, Just a note if someone wants to do this at scale. Statistics Canada has an AI to convert from pdfs to csv. https://www.statcan.gc.ca/en/data-science/projects#pdf-extraction On Wed, Jun 22, 2022, 5:59 AM Owen Stephens <[log in to unmask]> wrote: > There was some work at the Wellcome Collection several years ago looking at > extracting tabular information from digitised materials - a brief review > suggests that Abbyy FineReader Engine 11 was used to identify tables, > although there were a number of challenges - how far those challenges were > overcome wasn't clear to me from a brief review, but if this is of interest > there's a post at > > https://stacks.wellcomecollection.org/1-million-tables-and-counting-7e7e6c9f76e > plus a report the Wellcome Collection commissioned at > > https://github.com/wellcometrust/wellcomecollection.org/files/2148381/Scoping.MOH.for.data.recovery.report.-.final.pdf > > Christy Henshaw at the Wellcome Collection may be able to share some of > their experience and learning if you reach out to them > https://twitter.com/chenshaw > > Best wishes > > Owen > > On Tue, 21 Jun 2022 at 19:47, Medina-Smith, Andrea M. (Fed) < > [log in to unmask]> wrote: > > > Hello List, > > > > Has anyone had success converting tables in a PDF to CSV? These are scans > > of paper from the 70s on forward. I know this isn’t a super easy > > conversion, but I would think it’s not impossible either. > > > > Thanks, > > Andrea > > > > -- > > > > Andrea Medina-Smith > > Data Librarian > > Information Services Office > > National Institute of Standards and Technology > > [log in to unmask]<mailto:[log in to unmask]> > > https://orcid.org/0000-0002-1217-701X > > > > > > > > -- > Owen Stephens > Owen Stephens Consulting > Web: http://www.ostephens.com > Email: [log in to unmask] >