LISTSERV 16.5 - CODE4LIB Archives

Is there anyone out there with experience processing the raw data files for the Getty vocabularies (particularly TGN)?

We're adopting AAT and TGN as the primary vocabularies for our new shared cataloging system for our museum, library and archival collections. I'm presently trying to come up with some scripts to automate matching of places in existing databases to places in the TGN taxonomy. But I'm finding that the Getty data files are very complex, and I haven't yet figured out a foolproof method to do this. I'm curious if anyone else has traveled this road before, and if so whether you might be able to share some tips or code snippets.

Since most of our place names are going to be in the US, my gut feeling has been to first try to extract a list of places in the US and dump things like state, county, etc. into discrete database fields that I can match against. But I find myself a bit flummoxed by the polyhierarchical nature of the data (where one place can belong to multiple higher level places).

Another issue is the wide variety of place types in use in the taxonomy. England, for example, is a country, but the United States is a nation. This makes sense to a degree, but it also makes it a bit hard to figure out which term to match when you're trying to automate matching against data where the creators were less discerning about this sort of fine distinction.

I feel like I'm surely not the first person to tackle this, and would love to exchange notes...

-David Dwiggins

__________

David Dwiggins
Systems Librarian/Archivist, Historic New England
141 Cambridge Street, Boston, MA 02114
(617) 227-3956 x 242
[log in to unmask]
http://www.historicnewengland.org ( http://www.historicnewengland.org/ )

Visit http://www.LymanEstate.org for information on renting the historic Lyman Estate for your next event - a very special place for very special occasions.