On May 17, 2011, at 11:22 AM, Eric Lease Morgan wrote: >> What are some of the ways to best insert Linked Data endpoints into an >> XML file?... Given a name -- say, Plato or Thoreau -- how would one go about >> identifying good endpoints? > > When and if I do this work, I think I will use DBpedia and their lookup service. [1] Here's how: > > * do named-entity recognition (NER) against my documents > * for each name, place or organization element in the resulting XML > o query DBpedia for URIs via their lookup service > o add 1 or more of the resulting URIs as attributes > of the name, place, or organization element > * end for > > Once done I could use the enhanced XML file as the raw source for providing cool (and "kewl") services against the text -- word clouds, definitions, geo-locations, images, abstracts, find similar,purchase, print, do concordance against, etc. I've made some progress towards enriching my documents with Linked Data endpoints. Using the Stanford NER, I am able to create a rudimentary XML stream where the names, places, and organizations are marked up. [1] I then modify the XML to include tallies of the entities as well as the most significant links from DBedia. Finally, I output the resulting XML to STDOUT. This process works for any plain text. See txt2ner.pl. [2] I've created about six .ner files. [3] The idea is then to allow the reader to: 1) read the document, 2) see at a glance what named entities exist in the document, and 3) do things with the named entities. I started writing such an interface for desktop browsers, but the real goal is to create one for tablet devices. [4, 5] I got a bit stymied on both. In the end I hope to allow the person to select a named entity, automatically retrieve the content of the Linked Data end-point, and return a palette of choices allowing the reader to see a map, display a picture, get a definition, find related items, purchase the item, print the item, etc. As alluded to previously in this thread, one of the bigger challenges will be disambiguation. I see a crowd sourced solution in my future. I want to the thank the XML4Lib community for helping me out with some -- of what I thought was -- gnarly XPath syntax. The group was VERY responsive and really accurate. "Thank you!" [1] NER - http://bit.ly/e0SnA6 [2] txt2ner.pl - http://bit.ly/jQRjRH [3] .ner files - http://bit.ly/lJ8wKU [4] desktop interface - http://bit.ly/k4U6SZ [5] tablet interface - http://bit.ly/kueBm9 -- Eric Lease Morgan University of Notre Dame Great Books Survey -- http://bit.ly/auPD9Q