> What are some of the ways to best insert Linked Data endpoints into an
> XML file?... Given a name -- say, Plato or Thoreau -- how would one go about
> identifying good endpoints? What sort of query would I send to what sort
> of "database"? What might I get back? Assuming my goal is to enrich the
> text, what sort of link(s) should I insert into my XML?
Thank you for the helpful replies.
When and if I do this work, I think I will use DBpedia and their lookup service. [1] Here's how:
* do named-entity recognition (NER) against my documents
* for each name, place or organization element in the resulting XML
o query DBpedia for URIs via their lookup service
o add 1 or more of the resulting URIs as attributes
of the name, place, or organization element
* end for
Once done I could use the enhanced XML file as the raw source for providing cool (and "kewl") services against the text -- word clouds, definitions, geo-locations, images, abstracts, find similar,purchase, print, do concordance against, etc.
In the meantime, if I want to disambiguate I could go any number of routes. For example, I could crowd source the XML file allowing people to select the "correct" URI from each attribute listing. Alternatively, I could probably look for relationships between all the URIs in all the attributes and somehow statistically select the "correct" one. Whatever.
So much of library work is spent "cataloging" things and trying to make them findable. I sincerely believe most people don't think this is a very relevant service these days. And I don't know about you, but I certainly don't feel starved for information. Instead, I think people want to make better use of the content they have, and enriching texts in the way outlined above may be one way of going about it.
[1] lookup service - http://bit.ly/jbg0I6
--
Eric Lease Morgan
University of Notre Dame
|