PS: Kyle, that's your own version? That's... sort of kind of machine
readable. Well, not really. I can't figure out quite what's going on
string literals, seperated by newlines, or sometimes (but sometimes not)
with "Assigned code:" strings, etc.
That's in facta little bit harder to parse then what I'm doing against
LC. I'm running CSS selectors against the HTML; I'm not having any
difficulty parsing, the problem is that the format can change without
notice. But yours seems harder to parse to me, am I missing something?
In the end, all I need is a list of pairs, code to label. I'll be
looking up from code, so I don't even care about "alternate labels",
On 6/22/2011 5:57 PM, Kyle Banerjee wrote:
> I went through a process similar to what you describe sometime back for a
> tool I made (i.e. I could find no easily downloadable info). You can
> download something that will be easier to parse from
> It's probably not 100% accurate as I haven't downloaded for quite awhile.
> But catalogers have me correct errors they discover and there are about 800
> unique visitors per day so I assume they notice most things.
> It would be nice if this kind of data could be provided in a straightforward
> On Wed, Jun 22, 2011 at 2:44 PM, Jonathan Rochkind<[log in to unmask]> wrote:
>> Can anyone remind me if there's a machine readable copy of the MARC
>> geographic codes available at any persistent URL?
>> They're in HTML at http://www.loc.gov/marc/**geoareas/gacs_code.html<http://www.loc.gov/marc/geoareas/gacs_code.html>. I actually had a script that automatically downloaded from there and
>> "scraped" the HTML -- but sometime since I wrote the script, the HTML
>> structure on the page changed and it broke.
>> (I kind of thought that was unlikely since that HTML page itself was
>> machine generated -- but I guess they changed the software that generated
>> it. Certainly I knew that scraping HTML was a bad thing to rely on... which
>> is why I hope LC provides this in some format less likely to change?)