On Feb 25, 2013, at 8:12 AM, Seth van Hooland <[log in to unmask]> wrote:

> You want to automate the discovery of people, place names and events within a large corpus of unstructured documents or metadata (e.g. description field)? Then you might want to use the Named-Entity Recognition (NER) extension for OpenRefine that has been developed by Multimedia Lab (ELIS — Ghent University / iMinds) and MasTIC (Université Libre de Bruxelles).

Yes, named-entity recognition (NER) is fun. 

About a year ago I used a different application to do NER against about 100 digitized files. From my blog posting [0]:

  name-entity extraction – There was a desire to list the
  underlying names, places, and organizations from each text. These
  things can put a text into a context for the reader. Are there a
  lot of Irish names? Is there a preponderance of place names from
  the United States? To accomplish this task and assist in
  answering these sorts of questions, a Perl script was written
  around the Stanford Named Entity Recognizer. [1] This script
  ( [2]) extracts the entities, looks them up in DBedia, and
  saves metadata (abstracts, URLs to images, as well as latitudes &
  longitudes) describing the entities to a locally defined XML file
  for later processing. (See an example. [3]) A CGI script (ner.cgi [4])
  was then written to provide a reader-interface to these files.

Once I "NER'ed" the files and saved the corresponding linked data, I was able to create a tablet-based interface providing the means for the reader to see how the words are used in context, but also read a blurb from wikipedia as well as map places via Google Maps. For example, some extracts from a book called An adventure With The Apaches [5] but the data is not as clean as I would hope. The whole thing was a part of a project we called the Catholic Youth Literature Project. [6]

The ELIS software looks pretty interesting. [7]

Fun with distant reading and NER.

[0] blog postding -
[1] Stanford NER -
[2] -
[3] intermediate XML file -
[4] CGI script -
[5] Adventure -
[6] Catholic Youth Literature -
[7] ELIS -

Eric Lease Morgan
University of Notre Dame