Print

Print


Tabula is pretty miraculous in turning hamburgers to cows but scanned from the 70s is a lot to ask. Still, I would try it. https://tabula.technology/

Christina

-----Original Message-----
From: Code for Libraries <[log in to unmask]> On Behalf Of Haitz, Lisa (haitzlm)
Sent: Tuesday, June 21, 2022 3:02 PM
To: [log in to unmask]
Subject: [EXT] Re: [CODE4LIB] Converting old tables in PDF to CSV

Acrobat (full version) has an export to excel function. I’ve used it before and my table data was exported correctly as each value was in an excel cell.

😊

From: Code for Libraries <[log in to unmask]> on behalf of Matt Sherman <[log in to unmask]>
Date: Tuesday, June 21, 2022 at 2:53 PM
To: [log in to unmask] <[log in to unmask]>
Subject: Re: [CODE4LIB] Converting old tables in PDF to CSV External Email: Use Caution


Hm, that should be doable, but an annoying amount of work. I haven't done it with tables but I have done it with bibliographic records and regex.
Helps if there is a very consistent structure to the OCR.

On Tue, Jun 21, 2022 at 1:47 PM Medina-Smith, Andrea M. (Fed) < [log in to unmask]> wrote:

> Hello List,
>
> Has anyone had success converting tables in a PDF to CSV? These are 
> scans of paper from the 70s on forward. I know this isn’t a super easy 
> conversion, but I would think it’s not impossible either.
>
> Thanks,
> Andrea
>
> --
>
> Andrea Medina-Smith
> Data Librarian
> Information Services Office
> National Institute of Standards and Technology 
> [log in to unmask]<mailto:[log in to unmask]>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Forci
> d.org%2F0000-0002-1217-701X&amp;data=05%7C01%7Chaitzlm%40UCMAIL.UC.EDU
> %7C91ae208fd9fd4122494608da53b7446c%7Cf5222e6c5fc648eb8f0373db18203b63%7C1%7C0%7C637914343836515265%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=hrrJHUMhxJJ7I4A8bd9lMVqrkuZskwuBy6MtSc0ISaY%3D&amp;reserved=0
>
>
>