Print

Print


Hello, Graeme:

It's not a foolish question.  The download file is a pretty good size and that can make it seem a little unwieldly.

The "New Pilot" JSON downloads are newline delimited, meaning each resource is on its own line.  As chance would have it, this SO post pretty much describes the source JSON you are dealing with and solution for reading line-by-line.  (See the next answer too.)

https://stackoverflow.com/a/12451465

Best of luck,
Kevin

--
Kevin Ford
Library of Congress
Washington, DC


-----Original Message-----
From: Code for Libraries <[log in to unmask]> On Behalf Of Graeme Williams
Sent: Thursday, April 1, 2021 5:11 PM
To: [log in to unmask]
Subject: [CODE4LIB] A question ...

You can download Library of Congress authority files in various formats from https://id.loc.gov/download/

Since it's April Fool's Day, let me ask a foolish question:  does anyone have code for extracting data from these files?  Preferably the MADS/RDF/JSON format files.  Preferably Python code.

I'm interested in extracting (e.g.) the names from the name authority file, so I can check the various name fields in a MARC record (e.g., 100 $a) -- and do it locally, without calling an API for each entry.

Graeme Williams
Las Vegas, NV
github.com/lagbolt