Eric,
Jean Godby and I have been looking into this very problem. First, I
want to draw your attention to the difference between NER and the
subsequent problem of Identity Resolution. For example, in a given
text, an NER tool would identify "Kennedy" as a name, but that name
could refer to several different people. If you're able to get more
information (dates, titles, etc) from the text for a given reference,
you can do a better job of resolving the correct identity. Second,
Jean and I planned to use WorldCat Identities [1] as our end-point and
as a part of our identity resolution mechanism. With extra data, like
a birth and/or death year, you can really zero in on an identity.
[1] http://www.worldcat.org/identities
/dev
--
Devon Smith
Consulting Software Engineer
OCLC Office of Research
http://www.oclc.org/research/people/smith.htm
On Mon, May 16, 2011 at 8:33 AM, Eric Lease Morgan <[log in to unmask]> wrote:
> What are some of the ways to best insert Linked Data endpoints into an XML file?
>
> I have been playing lately with named-entity recognition/extraction technology. [1] Feed a text file, such as a novel, into the recognition program. Get back a rudimentary XML file where things like names, places, and organizations are marked with simple tags. I can then extract all the place names from a text, tabulate them, display a word-cloud, allow the reader to select items, guess latitude and longitude of the place, and finally plot them on a map. [2] This process works pretty well, but Google Maps only allows me to plot a limited number of items at a time. Consequently, I am thinking about preprocessing my data by looping through the XML file and adding latitude and longitude attributes to the place name elements.
>
> I then got to thinking about names and organizations. It would be nice to supplement these entities with canonical Linked Data endpoints. My application could then read the endpoints, extract the links associated with them, and display some sort of graphic illustrating relationships. Finally, I could allow the reader to select a relationship for further investigation.
>
> Given a name -- say, Plato or Thoreau -- how would one go about identifying good endpoints? What sort of query would I send to what sort of "database"? What might I get back? Assuming my goal is to enrich the text, what sort of link(s) should I insert into my XML?
>
> [1] NER - http://bit.ly/e0SnA6
> [2] geo-location for WebKit mobile - http://bit.ly/msIu16
>
> --
> Eric Morgan
> University of Notre Dame
>
--
Sent from my GMail account.
|