What are some of the ways to best insert Linked Data endpoints into an XML file?
I have been playing lately with named-entity recognition/extraction technology. [1] Feed a text file, such as a novel, into the recognition program. Get back a rudimentary XML file where things like names, places, and organizations are marked with simple tags. I can then extract all the place names from a text, tabulate them, display a word-cloud, allow the reader to select items, guess latitude and longitude of the place, and finally plot them on a map. [2] This process works pretty well, but Google Maps only allows me to plot a limited number of items at a time. Consequently, I am thinking about preprocessing my data by looping through the XML file and adding latitude and longitude attributes to the place name elements.
I then got to thinking about names and organizations. It would be nice to supplement these entities with canonical Linked Data endpoints. My application could then read the endpoints, extract the links associated with them, and display some sort of graphic illustrating relationships. Finally, I could allow the reader to select a relationship for further investigation.
Given a name -- say, Plato or Thoreau -- how would one go about identifying good endpoints? What sort of query would I send to what sort of "database"? What might I get back? Assuming my goal is to enrich the text, what sort of link(s) should I insert into my XML?
[1] NER - http://bit.ly/e0SnA6
[2] geo-location for WebKit mobile - http://bit.ly/msIu16
--
Eric Morgan
University of Notre Dame
|