Hi Eric,
If you think wikipedia articles could be used as good endpoints for your
purposes then have a look at this opensource tool
http://wikipedia-miner.sourceforge.net/
-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
Eric Lease Morgan
Sent: 16 May 2011 13:34
To: [log in to unmask]
Subject: [CODE4LIB] linked data endpoints
What are some of the ways to best insert Linked Data endpoints into an
XML file?
I have been playing lately with named-entity recognition/extraction
technology. [1] Feed a text file, such as a novel, into the recognition
program. Get back a rudimentary XML file where things like names,
places, and organizations are marked with simple tags. I can then
extract all the place names from a text, tabulate them, display a
word-cloud, allow the reader to select items, guess latitude and
longitude of the place, and finally plot them on a map. [2] This process
works pretty well, but Google Maps only allows me to plot a limited
number of items at a time. Consequently, I am thinking about
preprocessing my data by looping through the XML file and adding
latitude and longitude attributes to the place name elements.
I then got to thinking about names and organizations. It would be nice
to supplement these entities with canonical Linked Data endpoints. My
application could then read the endpoints, extract the links associated
with them, and display some sort of graphic illustrating relationships.
Finally, I could allow the reader to select a relationship for further
investigation.
Given a name -- say, Plato or Thoreau -- how would one go about
identifying good endpoints? What sort of query would I send to what sort
of "database"? What might I get back? Assuming my goal is to enrich the
text, what sort of link(s) should I insert into my XML?
[1] NER - http://bit.ly/e0SnA6
[2] geo-location for WebKit mobile - http://bit.ly/msIu16
--
Eric Morgan
University of Notre Dame
|