On Sep 17, 2012, at 3:12 PM, <[log in to unmask]> wrote:
> But I'm having trouble coming up with an algorithm that can consistently spit these out in the form we'd want to display given the data available in TGN.
A dense but rich, just-published article from D-Lib Magazine about geocoding -- Fulltext Geocoding Versus Spatial Metadata for Large Text Archives -- may give some guidance. From the conclusion:
Spatial information is playing an increasing role in the access
and mediation of information, driving interest in methods capable
of extracting spatial information from the textual contents of
large document archives. Automated approaches, even using fairly
basic algorithms, can achieve upwards of 76% accuracy when
recognizing, disambiguating, and converting to mappable
coordinates the references to individual cities and landmarks
buried deep within the text of a document. The workflow of a
typical geocoding system involves identifying potential
candidates from the text, checking those candidates for potential
matches in a gazetteer, and disambiguating and confirming those
candidates -- http://bit.ly/Ufl5k9
--
ELM
|