Print

Print


On 8/23/12 12:15 AM, Richard Wallis wrote:
> Hi Karen,
>
> Those that want to play with this data in their own triplestore may be
> interested in my post about doing that myself:  Putting WorldCat Data Into
> A Triple Store<http://dataliberate.com/2012/08/putting-worldcat-data-into-a-triple-store/>

Thanks. Now I know what I was doing wrong in  my SPARQL queries -- left 
off the <>. Back I go to try again.

>
>
> I am intrigued by your identification of punctuation differences, seems
> like one of the outputs has been through an extra cleanup step.  I will
> find out.

My guess is that the person working with the code noticed the ending 
punctuation and thought "Now THAT'S stupid" and removed it. Good on them!

kc

>
> On the creation of multiple identifiers for each instance of a place name -
> this is a symptom of the way the experimental data is created using what
> are called blank nodes.  Ideally we would have minted a URI for each unique
> place and linked all references to it.  Unfortunately, this was not easily
> achievable, as part of the experiment, on top of production WorldCat.
>   Solving issues such as this are on the agenda as our work in this area
> evolves.
>
> Keep the comments coming - they are very helpful.
>
> ~Richard.
>
> On 23 August 2012 00:56, Karen Coyle <[log in to unmask]> wrote:
>
>> On 8/22/12 2:56 PM, Richard Wallis wrote:
>>
>>> Hi Karen,
>>>
>>> I was not ignoring you previous question about where, in Marc terms, data
>>> was coming from.  I need to talk with someone who was in the core of the
>>> processing that produces the data.  Unfortunately I am currently being
>>> thwarted by vacations.
>>>
>> Richard, I understand, and apologize if I appeared to be pushing too hard.
>> In my own experience, requests for documentation are met with groans,
>> especially by folks who'd rather be "doing something useful," like writing
>> code. Unfortunately, it really helps to explain what you've done.
>>
>> I think I've solved the question of where the place of publication comes
>> from: 260 $a. The differences between the Web version and the triples
>> version is in punctuation. I'm still looking at examples, but it's a slog
>> since I'm re-creating records in the triples file with my minimal knowledge
>> of "grep" -- a hammer, but the best darned hammer there is. Here are some
>> examples:
>>
>> #1
>> File:
>>
>> <http://www.worldcat.org/oclc/**43836713<http://www.worldcat.org/oclc/43836713>>
>> <http://purl.org/library/**placeOfPublication<http://purl.org/library/placeOfPublication>>
>> _:**AX2dX40d4c600X3aX138a12b56f9X3**aXX2dX49b9
>> _:**AX2dX40d4c600X3aX138a12b56f9X3**aXX2dX49b9 <http://schema.org/name>
>> "New York"
>>
>>
>> Web: (Using RDFa It Firefox plugin [1])
>> <http://www.worldcat.org/oclc/**43836713<http://www.worldcat.org/oclc/43836713>>
>> a schema:Book;
>>        library:placeOfPublication [ a schema:Place;
>>           schema:name "New York :"@en ];
>>
>> #2
>>
>> File:
>> _:**A52eb8ca1X3aX138a1313c61X3aXX2**dX7536 <http://schema.org/name>
>> "Garden City, N.Y." .
>> _:**A52eb8ca1X3aX138a1313c61X3aXX2**dX7536 <http://www.w3.org/1999/02/22-*
>> *rdf-syntax-ns#type <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>> <
>> http://schema.org/Place> .
>> <http://www.worldcat.org/oclc/**524483<http://www.worldcat.org/oclc/524483>>
>> <http://purl.org/library/**placeOfPublication<http://purl.org/library/placeOfPublication>>
>> _:**A52eb8ca1X3aX138a1313c61X3aXX2**dX7536 .
>>
>> Web:
>> <http://www.worldcat.org/oclc/**524483<http://www.worldcat.org/oclc/524483>>
>> a schema:Book;
>>      library:holdingsCount "803"@en;
>>      library:oclcnum "524483"@en;
>>      library:placeOfPublication [ a schema:Place;
>>              schema:name "Garden City, N.Y.,"@en ];
>>
>> Another piece of information is that each instance of a place of
>> publication string is given a new identity:
>>
>> _:**AX2dX44931a01X3aX138a139ed19X3**aXX2dX600d <http://schema.org/name>
>> "Garden City, N.Y." .
>> _:**AX2dX44a4d9f9X3aX138a132b9d9X3**aXX2dX4efe <http://schema.org/name>
>> "Garden City, N.Y." .
>> _:**AX2dX44a4d9f9X3aX138a1378e1dX3**aXX2dX1d8d <http://schema.org/name>
>> "Garden City, N.Y." .
>> _:**AX2dX45b46946X3aX138a139141cX3**aXX2dX7073 <http://schema.org/name>
>> "Garden City, N.Y." .
>> _:**AX2dX4a6da202X3aX138a1387049X3**aXX2dX7bcc <http://schema.org/name>
>> "Garden City, N.Y." .
>> _:**AX2dX4b32d4b9X3aX138a1316a9bX3**aXX2dX5f92 <http://schema.org/name>
>> "Garden City, N.Y." .
>> _:**AX2dX4b5c4da3X3aX138a135d400X3**aXX2dX515e <http://schema.org/name>
>> "Garden City, N.Y." .
>> _:**AX2dX4b93edacX3aX138a1314f3eX3**aXX2dX58e9 <http://schema.org/name>
>> "Garden City, N.Y." .
>> _:**AX2dX4c810b47X3aX138a134150cX3**aXX2dX5b77 <http://schema.org/name>
>> "Garden City, N.Y." .
>> _:**AX2dX4f8be47aX3aX138a12b4eb9X3**aXX2dX1677 <http://schema.org/name>
>> "Garden City, N.Y." .
>> _:**AX2dX4f8be47aX3aX138a12b4eb9X3**aXX2dX23e1 <http://schema.org/name>
>> "Garden City, N.Y." .
>> _:**AX2dX52ad903aX3aX138a12d336bX3**aXX2dX5389 <http://schema.org/name>
>> "Garden City, N.Y." .
>>
>>
>> Where punctuation doesn't cloud the picture, these could eventually be
>> linked to:
>>    http://id.loc.gov/authorities/**names/n50068040.html<http://id.loc.gov/authorities/names/n50068040.html>
>> and:
>>    http://www.geonames.org/**5118226/garden-city.html<http://www.geonames.org/5118226/garden-city.html>
>>
>> and in that way could have a shared identity.
>>
>> kc
>>
>> p.s. Richard and I are on a list with someone who has loaded the triples
>> into a database. I will ask if we can announce it here, and will also try
>> to figure out how to use the SPARQL endpoint and provide some examples, if
>> that is ok with the dc:creator of the database.
>>
>> [1] javascript:location.href='http**://www.w3.org/2012/pyRdfa/**
>> extract?format=turtle&uri='+**escape(location.href)<http://www.w3.org/2012/pyRdfa/extract?format=turtle&uri='+escape(location.href)>
>>
>>
>>> In the meantime, can you let me have a few examples of where you are
>>> seeing
>>> discrepancies between the download triples and the RDFa embedded in
>>> WorldCat.org pages.
>>>
>>> ~Richard.
>>>
>>> On 22 August 2012 19:08, Karen Coyle <[log in to unmask]> wrote:
>>>
>>>   Richard, I've run into yet another area where documentation would be
>>>> helpful. There are differences between the schema.org/RDFa that is
>>>> embedded in WorldCat data and the exported WorldCat triples in the file.
>>>> One of those differences happens to be the source of the place of
>>>> publication, if I am reading it right. So, again, a request for
>>>> documentation on the fields included and their MARC source.
>>>>
>>>> Thanks,
>>>>
>>>> kc
>>>>
>>>> On 8/17/12 8:38 AM, Richard Wallis wrote:
>>>>
>>>>   In case you missed the press release earlier this week.
>>>>> You can now download a significant number of RDF triples describing the
>>>>> most highly held 1.2 million resources in WorldCat.  Licensed under
>>>>> ODC-BY.
>>>>>
>>>>> I've posted more details on my blog:
>>>>> http://dataliberate.com/2012/****08/get-yourself-a-linked-**data-**<http://dataliberate.com/2012/**08/get-yourself-a-linked-data-**>
>>>>> piece-of-worldcat-to-play-****with/<http://dataliberate.com/**
>>>>> 2012/08/get-yourself-a-linked-**data-piece-of-worldcat-to-**play-with/<http://dataliberate.com/2012/08/get-yourself-a-linked-data-piece-of-worldcat-to-play-with/>
>>>>> ~Richard.
>>>>>
>>>>>   --
>>>> Karen Coyle
>>>> [log in to unmask] http://kcoyle.net
>>>> ph: 1-510-540-7596
>>>> m: 1-510-435-8234
>>>> skype: kcoylenet
>>>>
>>>>
>>>
>> --
>> Karen Coyle
>> [log in to unmask] http://kcoyle.net
>> ph: 1-510-540-7596
>> m: 1-510-435-8234
>> skype: kcoylenet
>>
>
>

-- 
Karen Coyle
[log in to unmask] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet