On 8/22/12 2:56 PM, Richard Wallis wrote:
> Hi Karen,
>
> I was not ignoring you previous question about where, in Marc terms, data
> was coming from. I need to talk with someone who was in the core of the
> processing that produces the data. Unfortunately I am currently being
> thwarted by vacations.
Richard, I understand, and apologize if I appeared to be pushing too
hard. In my own experience, requests for documentation are met with
groans, especially by folks who'd rather be "doing something useful,"
like writing code. Unfortunately, it really helps to explain what you've
done.
I think I've solved the question of where the place of publication comes
from: 260 $a. The differences between the Web version and the triples
version is in punctuation. I'm still looking at examples, but it's a
slog since I'm re-creating records in the triples file with my minimal
knowledge of "grep" -- a hammer, but the best darned hammer there is.
Here are some examples:
#1
File:
<http://www.worldcat.org/oclc/43836713>
<http://purl.org/library/placeOfPublication>
_:AX2dX40d4c600X3aX138a12b56f9X3aXX2dX49b9
_:AX2dX40d4c600X3aX138a12b56f9X3aXX2dX49b9 <http://schema.org/name> "New
York"
Web: (Using RDFa It Firefox plugin [1])
<http://www.worldcat.org/oclc/43836713> a schema:Book;
library:placeOfPublication [ a schema:Place;
schema:name "New York :"@en ];
#2
File:
_:A52eb8ca1X3aX138a1313c61X3aXX2dX7536 <http://schema.org/name> "Garden
City, N.Y." .
_:A52eb8ca1X3aX138a1313c61X3aXX2dX7536
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://schema.org/Place> .
<http://www.worldcat.org/oclc/524483>
<http://purl.org/library/placeOfPublication>
_:A52eb8ca1X3aX138a1313c61X3aXX2dX7536 .
Web:
<http://www.worldcat.org/oclc/524483> a schema:Book;
library:holdingsCount "803"@en;
library:oclcnum "524483"@en;
library:placeOfPublication [ a schema:Place;
schema:name "Garden City, N.Y.,"@en ];
Another piece of information is that each instance of a place of
publication string is given a new identity:
_:AX2dX44931a01X3aX138a139ed19X3aXX2dX600d <http://schema.org/name>
"Garden City, N.Y." .
_:AX2dX44a4d9f9X3aX138a132b9d9X3aXX2dX4efe <http://schema.org/name>
"Garden City, N.Y." .
_:AX2dX44a4d9f9X3aX138a1378e1dX3aXX2dX1d8d <http://schema.org/name>
"Garden City, N.Y." .
_:AX2dX45b46946X3aX138a139141cX3aXX2dX7073 <http://schema.org/name>
"Garden City, N.Y." .
_:AX2dX4a6da202X3aX138a1387049X3aXX2dX7bcc <http://schema.org/name>
"Garden City, N.Y." .
_:AX2dX4b32d4b9X3aX138a1316a9bX3aXX2dX5f92 <http://schema.org/name>
"Garden City, N.Y." .
_:AX2dX4b5c4da3X3aX138a135d400X3aXX2dX515e <http://schema.org/name>
"Garden City, N.Y." .
_:AX2dX4b93edacX3aX138a1314f3eX3aXX2dX58e9 <http://schema.org/name>
"Garden City, N.Y." .
_:AX2dX4c810b47X3aX138a134150cX3aXX2dX5b77 <http://schema.org/name>
"Garden City, N.Y." .
_:AX2dX4f8be47aX3aX138a12b4eb9X3aXX2dX1677 <http://schema.org/name>
"Garden City, N.Y." .
_:AX2dX4f8be47aX3aX138a12b4eb9X3aXX2dX23e1 <http://schema.org/name>
"Garden City, N.Y." .
_:AX2dX52ad903aX3aX138a12d336bX3aXX2dX5389 <http://schema.org/name>
"Garden City, N.Y." .
Where punctuation doesn't cloud the picture, these could eventually be
linked to:
http://id.loc.gov/authorities/names/n50068040.html
and:
http://www.geonames.org/5118226/garden-city.html
and in that way could have a shared identity.
kc
p.s. Richard and I are on a list with someone who has loaded the triples
into a database. I will ask if we can announce it here, and will also
try to figure out how to use the SPARQL endpoint and provide some
examples, if that is ok with the dc:creator of the database.
[1]
javascript:location.href='http://www.w3.org/2012/pyRdfa/extract?format=turtle&uri='+escape(location.href)
>
> In the meantime, can you let me have a few examples of where you are seeing
> discrepancies between the download triples and the RDFa embedded in
> WorldCat.org pages.
>
> ~Richard.
>
> On 22 August 2012 19:08, Karen Coyle <[log in to unmask]> wrote:
>
>> Richard, I've run into yet another area where documentation would be
>> helpful. There are differences between the schema.org/RDFa that is
>> embedded in WorldCat data and the exported WorldCat triples in the file.
>> One of those differences happens to be the source of the place of
>> publication, if I am reading it right. So, again, a request for
>> documentation on the fields included and their MARC source.
>>
>> Thanks,
>>
>> kc
>>
>> On 8/17/12 8:38 AM, Richard Wallis wrote:
>>
>>> In case you missed the press release earlier this week.
>>>
>>> You can now download a significant number of RDF triples describing the
>>> most highly held 1.2 million resources in WorldCat. Licensed under
>>> ODC-BY.
>>>
>>> I've posted more details on my blog:
>>> http://dataliberate.com/2012/**08/get-yourself-a-linked-data-**
>>> piece-of-worldcat-to-play-**with/<http://dataliberate.com/2012/08/get-yourself-a-linked-data-piece-of-worldcat-to-play-with/>
>>>
>>> ~Richard.
>>>
>> --
>> Karen Coyle
>> [log in to unmask] http://kcoyle.net
>> ph: 1-510-540-7596
>> m: 1-510-435-8234
>> skype: kcoylenet
>>
>
>
--
Karen Coyle
[log in to unmask] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
|