Print

Print


OCLC's FAST contains corporate names derived from LCSH.

http://www.oclc.org/research/activities/fast.html

I wrote a simple proxy to the FAST API that can be used as
reconciliation endpoint in OpenRefine.

https://github.com/lawlesst/fast-reconcile

Ted

On Mon, Sep 29, 2014 at 9:37 AM, Simon Brown <[log in to unmask]> wrote:
> You could always web scrape, or download and then search the LCNAF with
> some script that looks like:
>
> #Build query for webscraping
> query = paste("http://id.loc.gov/search/?q=", URLencode("corporate name
> here "), "&q=cs%3Ahttp%3A%2F%2Fid.loc.gov%2Fauthorities%2Fnames")
>
> #Make the call
> result = readLines(query)
>
> #Find the lines containing "Corporate Name"
> lines = grep("Corporate Name, result)
>
> #Alternatively use approximate string matching on the downloaded LCNAF data
> query <- agrep("corporate name here",LCNAF_data_here)
>
> #Parse for whatever info you want
> ...
>
> My native programming language is R so I hope the functions like paste,
> readLines, grep, and URLencode are generic enough for other languages to
> have some kind of similar thing.  This can just be wrapped up into a for
> loop:
> for(i in 1:40000){...}
>
> Web scraping the results of one name at a time would be SLOW and obviously
> using an API is the way to go but it didn't look like the OCLC LCNAF API
> handled Corporate Name.  However, it sounds like in the previous message
> someone found a work around.  Best of luck! -Simon
>
>
>
>
>
>
> On Mon, Sep 29, 2014 at 8:45 AM, Matt Carruthers <[log in to unmask]> wrote:
>
>> Hi Patrick,
>>
>> Over the last few weeks I've been doing something very similar.  I was able
>> to figure out a process that works using OpenRefine.  It works by searching
>> the VIAF API first, limiting results to anything that is a corporate name
>> and has an LC source authority.  OpenRefine then extracts the LCCN and puts
>> that through the LCNAF API that OCLC has to get the name.  I had to use
>> VIAF for the initial name search because for some reason the LCNAF API
>> doesn't really handle corporate names as search terms very well, but works
>> with the LCCN just fine (there is the possibility that I'm just doing
>> something wrong, and if that's the case, anyone on the list can feel free
>> to correct me).  In the end, you get the LC name authority that corresponds
>> to your search term and a link to the authority on the LC Authorities
>> website.
>>
>> Anyway,  The process is fairly simple to run (just prepare an Excel
>> spreadsheet and paste JSON commands into OpenRefine).  The only reservation
>> is that I don't think it will run all 40,000 of your names at once.  I've
>> been using it to run 300-400 names at a time.  That said, I'd be happy to
>> share what I did with you if you'd like to try it out.  I have some
>> instructions written up in a Word doc, and the JSON script is in a text
>> file, so just email me off list and I can send them to you.
>>
>> Matt
>>
>> Matt Carruthers
>> Metadata Projects Librarian
>> University of Michigan
>> 734-615-5047
>> [log in to unmask]
>>
>> On Fri, Sep 26, 2014 at 7:03 PM, Karen Hanson <[log in to unmask]>
>> wrote:
>>
>> > I found the WorldCat Identities API useful for an institution name
>> > disambiguation project that I worked on a few years ago, though my goal
>> > wasn't to confirm whether names mapped to LCNAF.  The API response
>> includes
>> > a LCCN, and you can set it to fuzzy or exact matching, but you would need
>> > to write a script to pass each term in and process the results:
>> >
>> >
>> http://oclc.org/developer/develop/web-services/worldcat-identities.en.html
>> >
>> > I also can't speak to whether all LC Name Authorities are represented, so
>> > there may be a chance of some false negatives.
>> >
>> > OCLC has another API, but not sure if it covers corporate names:
>> > https://platform.worldcat.org/api-explorer/LCNAF
>> >
>> > I suspect there are others on the list that know more about the inner
>> > workings of these APIs if this might be an option for you... :)
>> >
>> > Karen
>> >
>> > -----Original Message-----
>> > From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
>> > Ethan Gruber
>> > Sent: Friday, September 26, 2014 3:54 PM
>> > To: [log in to unmask]
>> > Subject: Re: [CODE4LIB] Reconciling corporate names?
>> >
>> > I would check with the developers of SNAC (
>> > http://socialarchive.iath.virginia.edu/), as they've spent a lot of time
>> > developing named entity recognition scripts for personal and corporate
>> > names. They might have something you can reuse.
>> >
>> > Ethan
>> >
>> > On Fri, Sep 26, 2014 at 3:47 PM, Galligan, Patrick <
>> [log in to unmask]
>> > >
>> > wrote:
>> >
>> > > I'm looking to reconcile about 40,000 corporate names against LCNAF to
>> > > see whether they are authorized strings or not, but I'm drawing a
>> > > blank about how to get it done.
>> > >
>> > > I've used http://freeyourmetadata.org/ for reconciling subject
>> > > headings before, but I can't get it to work for LCNAF. Has anyone had
>> > > any experience in a project like this? I'd love to hear some ideas for
>> > > automatically dealing with a large data set like this that we did not
>> > > create and do not know how the names were created.
>> > >
>> > > Thanks!
>> > >
>> > > -Patrick Galligan
>> > >
>> >
>>
>
>
>
> --
> Simon Brown
> [log in to unmask]
> simoncharlesbrown (Skype)
> 831.440.7466 (Phone)
>
> *Following our will and wind we may just go where no one's been -- MJK*