Aha, that's probably what I need. And now I remember Ross probably
pointed that out to me before.
I'm still having trouble figuring out how to get from the rdf-triples
it's got there to a hash of codes (as they appear in marc records, not
URIs), to labels.
It seems like it in fact will be a lot more work than the scraping I'm
doing of the HTML page now, but of course the problem with the HTML page
is that it's structure is not reliable, it changes. So the structured
data from id.loc.gov is the way to go.... but I'm still getting confused
figuring out how to get what I want out of it. If anyone wants to give
me any hints, appreciated.
It kind of looks like I FIRST have to get the complete list from one of
the structured forms (RDF-XML, triple, etc), and THEN make a seperate
HTTP request for _each_ term listed in the list to get the code as found
in the MARC record and the label. That's a pretty slow process, as well
as requiring writing more code than a task like this seems like it
should take. Is there anything on that site that can give me the
code/label pairs in one single download?
On 6/22/2011 6:38 PM, Stephen Hearn wrote:
> Have you looked at id.loc.gov? One of its vocabularies defines URLs
> for each of the MARC geographic area codes.
> On Wed, Jun 22, 2011 at 4:44 PM, Jonathan Rochkind<[log in to unmask]> wrote:
>> Can anyone remind me if there's a machine readable copy of the MARC
>> geographic codes available at any persistent URL?
>> They're in HTML at http://www.loc.gov/marc/geoareas/gacs_code.html . I
>> actually had a script that automatically downloaded from there and "scraped"
>> the HTML -- but sometime since I wrote the script, the HTML structure on the
>> page changed and it broke.
>> (I kind of thought that was unlikely since that HTML page itself was machine
>> generated -- but I guess they changed the software that generated it.
>> Certainly I knew that scraping HTML was a bad thing to rely on... which is
>> why I hope LC provides this in some format less likely to change?)