There's no reason to screen scrape the results. The label service permits the use of the "Accept" header. For example: curl -i -L -H "Accept: application/rdf+xml" http://id.loc.gov/authorities/label/orchids Take note of the initial set of response headers: HTTP/1.1 302 FOUND Location: http://id.loc.gov/authorities/subjects/sh85095334 X-URI: http://id.loc.gov/authorities/subjects/sh85095334 X-PrefLabel: Orchids Cache-Control: public, max-age=1209600 Content-Length: 0 Date: Sat, 29 Jul 2017 12:41:00 GMT Server: Apache X-Varnish: 95467183 53781367 Age: 2343793 Via: 1.1 varnish-v4 X-Cache: HIT X-Cache-Hits: 24 Connection: keep-alive If you want, you could perform only a HEAD request on the label service and then use the X-URI and X-PrefLabel headers to gather the info you need. NB: The service works on an exact match, more or less; take off the 's' of 'orchids' and you'll get an entirely different result. You can also operate on the search results - not the label service - programatically. See "Supported Search serialization formats" here: http://id.loc.gov/techcenter/serializations.html One XML-based option and a JSON one too. Yours, Kevin On 8/25/17 10:39, Josh Welker wrote: > Thanks, Nathan. That looks like it will work if I do it manually, but there > is no interface for doing it programmatically. Is LC okay with me screen > scraping the search results? > > Joshua Welker > Information Technology Librarian > James C. Kirkpatrick Library > University of Central Missouri > Warrensburg, MO 64093 > JCKL 2260 > 660.543.8022 > > > On Fri, Aug 25, 2017 at 10:18 AM, Trail, Nate <[log in to unmask]> wrote: > >> You can try our "label" service. See under "known label retrieval" here: >> http://id.loc.gov/techcenter/searching.html >> I would be glad to help further. >> >> Thanks, Nate >> >> ----------------------------------------- >> Nate Trail >> Network Development & MARC Standards Office >> LS/ABA/NDMSO >> LA308, Mail Stop 4402 >> Library of Congress >> Washington DC 20540 >> >> >> >> >> -----Original Message----- >> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of >> Josh Welker >> Sent: Friday, August 25, 2017 11:12 AM >> To: [log in to unmask] >> Subject: [CODE4LIB] Searching LC Name Authority file programmatically >> >> I have sort of inherited authority control recently at my library, and I >> want to find some way to automate some common workflows. I am looking for >> an easy way to query blind name references against the LC Name Authority >> master file. There is no API for searching it on the web, and the name file >> itself is 10+ GB and hard to work with. >> >> Here are options as I see them: >> >> >> - Screen scrape the search engine at id.loc.gov. >> - Load the 10+ GB name file into a local database to query >> programmatically. >> >> Does anyone have experience with either method? Does some other method >> exist I am not aware of? >> >> Joshua Welker >> Information Technology Librarian >> James C. Kirkpatrick Library >> University of Central Missouri >> Warrensburg, MO 64093 >> JCKL 2260 >> 660.543.8022 >>