OCLC's FAST contains corporate names derived from LCSH. http://www.oclc.org/research/activities/fast.html I wrote a simple proxy to the FAST API that can be used as reconciliation endpoint in OpenRefine. https://github.com/lawlesst/fast-reconcile Ted On Mon, Sep 29, 2014 at 9:37 AM, Simon Brown <[log in to unmask]> wrote: > You could always web scrape, or download and then search the LCNAF with > some script that looks like: > > #Build query for webscraping > query = paste("http://id.loc.gov/search/?q=", URLencode("corporate name > here "), "&q=cs%3Ahttp%3A%2F%2Fid.loc.gov%2Fauthorities%2Fnames") > > #Make the call > result = readLines(query) > > #Find the lines containing "Corporate Name" > lines = grep("Corporate Name, result) > > #Alternatively use approximate string matching on the downloaded LCNAF data > query <- agrep("corporate name here",LCNAF_data_here) > > #Parse for whatever info you want > ... > > My native programming language is R so I hope the functions like paste, > readLines, grep, and URLencode are generic enough for other languages to > have some kind of similar thing. This can just be wrapped up into a for > loop: > for(i in 1:40000){...} > > Web scraping the results of one name at a time would be SLOW and obviously > using an API is the way to go but it didn't look like the OCLC LCNAF API > handled Corporate Name. However, it sounds like in the previous message > someone found a work around. Best of luck! -Simon > > > > > > > On Mon, Sep 29, 2014 at 8:45 AM, Matt Carruthers <[log in to unmask]> wrote: > >> Hi Patrick, >> >> Over the last few weeks I've been doing something very similar. I was able >> to figure out a process that works using OpenRefine. It works by searching >> the VIAF API first, limiting results to anything that is a corporate name >> and has an LC source authority. OpenRefine then extracts the LCCN and puts >> that through the LCNAF API that OCLC has to get the name. I had to use >> VIAF for the initial name search because for some reason the LCNAF API >> doesn't really handle corporate names as search terms very well, but works >> with the LCCN just fine (there is the possibility that I'm just doing >> something wrong, and if that's the case, anyone on the list can feel free >> to correct me). In the end, you get the LC name authority that corresponds >> to your search term and a link to the authority on the LC Authorities >> website. >> >> Anyway, The process is fairly simple to run (just prepare an Excel >> spreadsheet and paste JSON commands into OpenRefine). The only reservation >> is that I don't think it will run all 40,000 of your names at once. I've >> been using it to run 300-400 names at a time. That said, I'd be happy to >> share what I did with you if you'd like to try it out. I have some >> instructions written up in a Word doc, and the JSON script is in a text >> file, so just email me off list and I can send them to you. >> >> Matt >> >> Matt Carruthers >> Metadata Projects Librarian >> University of Michigan >> 734-615-5047 >> [log in to unmask] >> >> On Fri, Sep 26, 2014 at 7:03 PM, Karen Hanson <[log in to unmask]> >> wrote: >> >> > I found the WorldCat Identities API useful for an institution name >> > disambiguation project that I worked on a few years ago, though my goal >> > wasn't to confirm whether names mapped to LCNAF. The API response >> includes >> > a LCCN, and you can set it to fuzzy or exact matching, but you would need >> > to write a script to pass each term in and process the results: >> > >> > >> http://oclc.org/developer/develop/web-services/worldcat-identities.en.html >> > >> > I also can't speak to whether all LC Name Authorities are represented, so >> > there may be a chance of some false negatives. >> > >> > OCLC has another API, but not sure if it covers corporate names: >> > https://platform.worldcat.org/api-explorer/LCNAF >> > >> > I suspect there are others on the list that know more about the inner >> > workings of these APIs if this might be an option for you... :) >> > >> > Karen >> > >> > -----Original Message----- >> > From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of >> > Ethan Gruber >> > Sent: Friday, September 26, 2014 3:54 PM >> > To: [log in to unmask] >> > Subject: Re: [CODE4LIB] Reconciling corporate names? >> > >> > I would check with the developers of SNAC ( >> > http://socialarchive.iath.virginia.edu/), as they've spent a lot of time >> > developing named entity recognition scripts for personal and corporate >> > names. They might have something you can reuse. >> > >> > Ethan >> > >> > On Fri, Sep 26, 2014 at 3:47 PM, Galligan, Patrick < >> [log in to unmask] >> > > >> > wrote: >> > >> > > I'm looking to reconcile about 40,000 corporate names against LCNAF to >> > > see whether they are authorized strings or not, but I'm drawing a >> > > blank about how to get it done. >> > > >> > > I've used http://freeyourmetadata.org/ for reconciling subject >> > > headings before, but I can't get it to work for LCNAF. Has anyone had >> > > any experience in a project like this? I'd love to hear some ideas for >> > > automatically dealing with a large data set like this that we did not >> > > create and do not know how the names were created. >> > > >> > > Thanks! >> > > >> > > -Patrick Galligan >> > > >> > >> > > > > -- > Simon Brown > [log in to unmask] > simoncharlesbrown (Skype) > 831.440.7466 (Phone) > > *Following our will and wind we may just go where no one's been -- MJK*