Print

Print


There's no reason to screen scrape the results.

The label service permits the use of the "Accept" header.  For example:

curl -i -L -H "Accept: application/rdf+xml" 
http://id.loc.gov/authorities/label/orchids

Take note of the initial set of response headers:

HTTP/1.1 302 FOUND
Location: http://id.loc.gov/authorities/subjects/sh85095334
X-URI: http://id.loc.gov/authorities/subjects/sh85095334
X-PrefLabel: Orchids
Cache-Control: public, max-age=1209600
Content-Length: 0
Date: Sat, 29 Jul 2017 12:41:00 GMT
Server: Apache
X-Varnish: 95467183 53781367
Age: 2343793
Via: 1.1 varnish-v4
X-Cache: HIT
X-Cache-Hits: 24
Connection: keep-alive

If you want, you could perform only a HEAD request on the label service 
and then use the X-URI and X-PrefLabel headers to gather the info you 
need.  NB: The service works on an exact match, more or less; take off 
the 's' of 'orchids' and you'll get an entirely different result.

You can also operate on the search results - not the label service - 
programatically.  See "Supported Search serialization formats" here: 
http://id.loc.gov/techcenter/serializations.html   One XML-based option 
and a JSON one too.

Yours,
Kevin



On 8/25/17 10:39, Josh Welker wrote:
> Thanks, Nathan. That looks like it will work if I do it manually, but there
> is no interface for doing it programmatically. Is LC okay with me screen
> scraping the search results?
> 
> Joshua Welker
> Information Technology Librarian
> James C. Kirkpatrick Library
> University of Central Missouri
> Warrensburg, MO 64093
> JCKL 2260
> 660.543.8022
> 
> 
> On Fri, Aug 25, 2017 at 10:18 AM, Trail, Nate <[log in to unmask]> wrote:
> 
>> You can try our "label" service. See under "known label retrieval" here:
>> http://id.loc.gov/techcenter/searching.html
>> I would be glad to help further.
>>
>> Thanks, Nate
>>
>> -----------------------------------------
>> Nate Trail
>> Network Development & MARC Standards Office
>> LS/ABA/NDMSO
>> LA308, Mail Stop 4402
>> Library of Congress
>> Washington DC 20540
>>
>>
>>
>>
>> -----Original Message-----
>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>> Josh Welker
>> Sent: Friday, August 25, 2017 11:12 AM
>> To: [log in to unmask]
>> Subject: [CODE4LIB] Searching LC Name Authority file programmatically
>>
>> I have sort of inherited authority control recently at my library, and I
>> want to find some way to automate some common workflows. I am looking for
>> an easy way to query blind name references against the LC Name Authority
>> master file. There is no API for searching it on the web, and the name file
>> itself is 10+ GB and hard to work with.
>>
>> Here are options as I see them:
>>
>>
>>     - Screen scrape the search engine at id.loc.gov.
>>     - Load the 10+ GB name file into a local database to query
>>     programmatically.
>>
>> Does anyone have experience with either method? Does some other method
>> exist I am not aware of?
>>
>> Joshua Welker
>> Information Technology Librarian
>> James C. Kirkpatrick Library
>> University of Central Missouri
>> Warrensburg, MO 64093
>> JCKL 2260
>> 660.543.8022
>>