LISTSERV 16.5 - CODE4LIB Archives

There was some work at the Wellcome Collection several years ago looking at
extracting tabular information from digitised materials - a brief review
suggests that Abbyy FineReader Engine 11 was used to identify tables,
although there were a number of challenges - how far those challenges were
overcome wasn't clear to me from a brief review, but if this is of interest
there's a post at
https://stacks.wellcomecollection.org/1-million-tables-and-counting-7e7e6c9f76e
plus a report the Wellcome Collection commissioned at
https://github.com/wellcometrust/wellcomecollection.org/files/2148381/Scoping.MOH.for.data.recovery.report.-.final.pdf

Christy Henshaw at the Wellcome Collection may be able to share some of
their experience and learning if you reach out to them
https://twitter.com/chenshaw

Best wishes

Owen

On Tue, 21 Jun 2022 at 19:47, Medina-Smith, Andrea M. (Fed) <
[log in to unmask]> wrote:

> Hello List,
>
> Has anyone had success converting tables in a PDF to CSV? These are scans
> of paper from the 70s on forward. I know this isn’t a super easy
> conversion, but I would think it’s not impossible either.
>
> Thanks,
> Andrea
>
> --
>
> Andrea Medina-Smith
> Data Librarian
> Information Services Office
> National Institute of Standards and Technology
> [log in to unmask]<mailto:[log in to unmask]>
> https://orcid.org/0000-0002-1217-701X
>
>
>

-- 
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: [log in to unmask]