Hi Eric,
I'm not sure that there can be answers to your question without some
more information first (and, possibly, more mining on your part).
First off, these "names" -- can they be fairly confidently identified
as identities? If so, VIAF (http://viaf.org/) would be your first
step. While VIAF has some links to dbpedia/wikipedia (at least they
used to), they don't seem to for either of your examples (Plato,
Thoreau), however, they do provide owl:sameAs links to the National
Library of Sweden and the German National Library's resources which
have much better coverage to dbpedia/wikipedia.
You might also want to look at OpenCalais for identifying resources in
your text: http://www.opencalais.com/. Also Sindice
(http://sindice.com/), but to be useful, you're going to need to
filter it considerably.
Anyway, you're going to have to know what you have and what sort of
thing you're hoping to link to before you can do much.
-Ross.
On Mon, May 16, 2011 at 8:33 AM, Eric Lease Morgan <[log in to unmask]> wrote:
> What are some of the ways to best insert Linked Data endpoints into an XML file?
>
> I have been playing lately with named-entity recognition/extraction technology. [1] Feed a text file, such as a novel, into the recognition program. Get back a rudimentary XML file where things like names, places, and organizations are marked with simple tags. I can then extract all the place names from a text, tabulate them, display a word-cloud, allow the reader to select items, guess latitude and longitude of the place, and finally plot them on a map. [2] This process works pretty well, but Google Maps only allows me to plot a limited number of items at a time. Consequently, I am thinking about preprocessing my data by looping through the XML file and adding latitude and longitude attributes to the place name elements.
>
> I then got to thinking about names and organizations. It would be nice to supplement these entities with canonical Linked Data endpoints. My application could then read the endpoints, extract the links associated with them, and display some sort of graphic illustrating relationships. Finally, I could allow the reader to select a relationship for further investigation.
>
> Given a name -- say, Plato or Thoreau -- how would one go about identifying good endpoints? What sort of query would I send to what sort of "database"? What might I get back? Assuming my goal is to enrich the text, what sort of link(s) should I insert into my XML?
>
> [1] NER - http://bit.ly/e0SnA6
> [2] geo-location for WebKit mobile - http://bit.ly/msIu16
>
> --
> Eric Morgan
> University of Notre Dame
>
|