I use Geonames for this sort of thing a lot. With cities and
administrative divisions being offered in a machine-readable format, it's
pretty easy to encode places in a format that adheres to AACR2 or other
cataloging rules. There are of course problems disambiguating city names
when no country is given, but I get a pretty accurate response in general:
probably greater than 76% when I have both the city and country or city and
geographic region.
Ethan
On Mon, Sep 17, 2012 at 3:16 PM, Eric Lease Morgan <[log in to unmask]> wrote:
> On Sep 17, 2012, at 3:12 PM, <[log in to unmask]> wrote:
>
> > But I'm having trouble coming up with an algorithm that can consistently
> spit these out in the form we'd want to display given the data available in
> TGN.
>
>
> A dense but rich, just-published article from D-Lib Magazine about
> geocoding -- Fulltext Geocoding Versus Spatial Metadata for Large Text
> Archives -- may give some guidance. From the conclusion:
>
> Spatial information is playing an increasing role in the access
> and mediation of information, driving interest in methods capable
> of extracting spatial information from the textual contents of
> large document archives. Automated approaches, even using fairly
> basic algorithms, can achieve upwards of 76% accuracy when
> recognizing, disambiguating, and converting to mappable
> coordinates the references to individual cities and landmarks
> buried deep within the text of a document. The workflow of a
> typical geocoding system involves identifying potential
> candidates from the text, checking those candidates for potential
> matches in a gazetteer, and disambiguating and confirming those
> candidates -- http://bit.ly/Ufl5k9
>
> --
> ELM
>
|