Hi All,
Good morning from foggy Maryland. Materials scientists and aerospace engineers have these 1970s and 1980s (and prob before) technical reports with vast tables of experimental data. Pasting a picture below to give a flavor:
[cid:[log in to unmask]]
I can OCR this with Acrobat Pro, but what's the current best for extracting the table? We can't upload this into a commercial service and our on prem AI models went "NOPE!" I tried tabula - and did ok with some of the tables sprinkled through the text but not ones like shown in the image. It looks like there are a number of tools intended for AI and RAG (like Docling). Does anyone have experience with these for this purpose?
If it's a paid service also interested, depending on a number of factors.
Thanks in advance,
Christina
Christina K. Pikas, PhD
Principal Professional Staff
Johns Hopkins Applied Physics Laboratory
11100 Johns Hopkins Rd, Laurel, MD 20723
O: (240) 228-4812
[x]<https://twitter.com/JHUAPL> [bluesky] <https://bsky.app/profile/jhuapl.bsky.social> [facebook] <https://www.facebook.com/JHUAPL/> [instagram] <https://www.instagram.com/johnshopkinsapl/> [threads] <https://www.threads.net/@johnshopkinsapl> [youtube] <https://www.youtube.com/c/jhuapl> [linkedin] <https://www.linkedin.com/company/johns-hopkins-university-applied-physics-laboratory/>
[Applied Physics Laboratory]<https://www.jhuapl.edu/>