Print

Print


That's exactly what I needed. Thanks, Kevin!

Joshua Welker
Information Technology Librarian
James C. Kirkpatrick Library
University of Central Missouri
Warrensburg, MO 64093
JCKL 2260
660.543.8022


On Fri, Aug 25, 2017 at 10:50 AM, Kevin Ford <[log in to unmask]> wrote:

> There's no reason to screen scrape the results.
>
> The label service permits the use of the "Accept" header.  For example:
>
> curl -i -L -H "Accept: application/rdf+xml" http://id.loc.gov/authorities/
> label/orchids
>
> Take note of the initial set of response headers:
>
> HTTP/1.1 302 FOUND
> Location: http://id.loc.gov/authorities/subjects/sh85095334
> X-URI: http://id.loc.gov/authorities/subjects/sh85095334
> X-PrefLabel: Orchids
> Cache-Control: public, max-age=1209600
> Content-Length: 0
> Date: Sat, 29 Jul 2017 12:41:00 GMT
> Server: Apache
> X-Varnish: 95467183 53781367
> Age: 2343793
> Via: 1.1 varnish-v4
> X-Cache: HIT
> X-Cache-Hits: 24
> Connection: keep-alive
>
> If you want, you could perform only a HEAD request on the label service
> and then use the X-URI and X-PrefLabel headers to gather the info you
> need.  NB: The service works on an exact match, more or less; take off the
> 's' of 'orchids' and you'll get an entirely different result.
>
> You can also operate on the search results - not the label service -
> programatically.  See "Supported Search serialization formats" here:
> http://id.loc.gov/techcenter/serializations.html   One XML-based option
> and a JSON one too.
>
> Yours,
> Kevin
>
>
>
>
> On 8/25/17 10:39, Josh Welker wrote:
>
>> Thanks, Nathan. That looks like it will work if I do it manually, but
>> there
>> is no interface for doing it programmatically. Is LC okay with me screen
>> scraping the search results?
>>
>> Joshua Welker
>> Information Technology Librarian
>> James C. Kirkpatrick Library
>> University of Central Missouri
>> Warrensburg, MO 64093
>> JCKL 2260
>> 660.543.8022
>>
>>
>> On Fri, Aug 25, 2017 at 10:18 AM, Trail, Nate <[log in to unmask]> wrote:
>>
>> You can try our "label" service. See under "known label retrieval" here:
>>> http://id.loc.gov/techcenter/searching.html
>>> I would be glad to help further.
>>>
>>> Thanks, Nate
>>>
>>> -----------------------------------------
>>> Nate Trail
>>> Network Development & MARC Standards Office
>>> LS/ABA/NDMSO
>>> LA308, Mail Stop 4402
>>> Library of Congress
>>> Washington DC 20540
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>>> Josh Welker
>>> Sent: Friday, August 25, 2017 11:12 AM
>>> To: [log in to unmask]
>>> Subject: [CODE4LIB] Searching LC Name Authority file programmatically
>>>
>>> I have sort of inherited authority control recently at my library, and I
>>> want to find some way to automate some common workflows. I am looking for
>>> an easy way to query blind name references against the LC Name Authority
>>> master file. There is no API for searching it on the web, and the name
>>> file
>>> itself is 10+ GB and hard to work with.
>>>
>>> Here are options as I see them:
>>>
>>>
>>>     - Screen scrape the search engine at id.loc.gov.
>>>     - Load the 10+ GB name file into a local database to query
>>>     programmatically.
>>>
>>> Does anyone have experience with either method? Does some other method
>>> exist I am not aware of?
>>>
>>> Joshua Welker
>>> Information Technology Librarian
>>> James C. Kirkpatrick Library
>>> University of Central Missouri
>>> Warrensburg, MO 64093
>>> JCKL 2260
>>> 660.543.8022
>>>
>>>