Print

Print


Aha, that's probably what I need. And now I remember Ross probably 
pointed that out to me before.

I'm still having trouble figuring out how to get from the rdf-triples 
it's got there to a hash of codes (as they appear in marc records, not 
URIs), to labels.

It seems like it in fact will be a lot more work than the scraping I'm 
doing of the HTML page now, but of course the problem with the HTML page 
is that it's structure is not reliable, it changes.  So the structured 
data from id.loc.gov is the way to go.... but I'm still getting confused 
figuring out how to get what I want out of it. If anyone wants to give 
me any hints, appreciated.

It kind of looks like I FIRST have to get the complete list from one of 
the structured forms (RDF-XML, triple, etc), and THEN make a seperate 
HTTP request for _each_ term listed in the list to get the code as found 
in the MARC record and the label.  That's a pretty slow process, as well 
as requiring writing more code than a task like this seems like it 
should take. Is there anything on that site that can give me the 
code/label pairs in one single download?


On 6/22/2011 6:38 PM, Stephen Hearn wrote:
> Have you looked at id.loc.gov? One of its vocabularies defines URLs
> for each of the MARC geographic area codes.
>
> Stephen
>
>
> On Wed, Jun 22, 2011 at 4:44 PM, Jonathan Rochkind<[log in to unmask]>  wrote:
>> Can anyone remind me if there's a machine readable copy of the MARC
>> geographic codes available at any persistent URL?
>>
>> They're in HTML at http://www.loc.gov/marc/geoareas/gacs_code.html . I
>> actually had a script that automatically downloaded from there and "scraped"
>> the HTML -- but sometime since I wrote the script, the HTML structure on the
>> page changed and it broke.
>>
>> (I kind of thought that was unlikely since that HTML page itself was machine
>> generated -- but I guess they changed the software that generated it.
>> Certainly I knew that scraping HTML was a bad thing to rely on... which is
>> why I hope LC provides this in some format less likely to change?)
>>
>
>