PS: Kyle, that's your own version? That's... sort of kind of machine readable. Well, not really. I can't figure out quite what's going on there, the label/value pairs are just stuffed in single, javascript string literals, seperated by newlines, or sometimes (but sometimes not) with "Assigned code:" strings, etc. That's in facta little bit harder to parse then what I'm doing against LC. I'm running CSS selectors against the HTML; I'm not having any difficulty parsing, the problem is that the format can change without notice. But yours seems harder to parse to me, am I missing something? In the end, all I need is a list of pairs, code to label. I'll be looking up from code, so I don't even care about "alternate labels", really. On 6/22/2011 5:57 PM, Kyle Banerjee wrote: > I went through a process similar to what you describe sometime back for a > tool I made (i.e. I could find no easily downloadable info). You can > download something that will be easier to parse from > > http://calculate.alptown.com/gac.js > > It's probably not 100% accurate as I haven't downloaded for quite awhile. > But catalogers have me correct errors they discover and there are about 800 > unique visitors per day so I assume they notice most things. > > It would be nice if this kind of data could be provided in a straightforward > format. > > kyle > > > > On Wed, Jun 22, 2011 at 2:44 PM, Jonathan Rochkind<[log in to unmask]> wrote: > >> Can anyone remind me if there's a machine readable copy of the MARC >> geographic codes available at any persistent URL? >> >> They're in HTML at http://www.loc.gov/marc/**geoareas/gacs_code.html<http://www.loc.gov/marc/geoareas/gacs_code.html>. I actually had a script that automatically downloaded from there and >> "scraped" the HTML -- but sometime since I wrote the script, the HTML >> structure on the page changed and it broke. >> >> (I kind of thought that was unlikely since that HTML page itself was >> machine generated -- but I guess they changed the software that generated >> it. Certainly I knew that scraping HTML was a bad thing to rely on... which >> is why I hope LC provides this in some format less likely to change?) >> > >