Print

Print


On Sep 17, 2012, at 3:12 PM, <[log in to unmask]> wrote:

> But I'm having trouble coming up with an algorithm that can consistently spit these out in the form we'd want to display given the data available in TGN.


A dense but rich, just-published article from D-Lib Magazine about geocoding -- Fulltext Geocoding Versus Spatial Metadata for Large Text Archives -- may give some guidance. From the conclusion:

 Spatial information is playing an increasing role in the access
 and mediation of information, driving interest in methods capable
 of extracting spatial information from the textual contents of
 large document archives. Automated approaches, even using fairly
 basic algorithms, can achieve upwards of 76% accuracy when
 recognizing, disambiguating, and converting to mappable
 coordinates the references to individual cities and landmarks
 buried deep within the text of a document. The workflow of a
 typical geocoding system involves identifying potential
 candidates from the text, checking those candidates for potential
 matches in a gazetteer, and disambiguating and confirming those
 candidates -- http://bit.ly/Ufl5k9

--
ELM