On 8/23/12 12:15 AM, Richard Wallis wrote: > Hi Karen, > > Those that want to play with this data in their own triplestore may be > interested in my post about doing that myself: Putting WorldCat Data Into > A Triple Store<http://dataliberate.com/2012/08/putting-worldcat-data-into-a-triple-store/> Thanks. Now I know what I was doing wrong in my SPARQL queries -- left off the <>. Back I go to try again. > > > I am intrigued by your identification of punctuation differences, seems > like one of the outputs has been through an extra cleanup step. I will > find out. My guess is that the person working with the code noticed the ending punctuation and thought "Now THAT'S stupid" and removed it. Good on them! kc > > On the creation of multiple identifiers for each instance of a place name - > this is a symptom of the way the experimental data is created using what > are called blank nodes. Ideally we would have minted a URI for each unique > place and linked all references to it. Unfortunately, this was not easily > achievable, as part of the experiment, on top of production WorldCat. > Solving issues such as this are on the agenda as our work in this area > evolves. > > Keep the comments coming - they are very helpful. > > ~Richard. > > On 23 August 2012 00:56, Karen Coyle <[log in to unmask]> wrote: > >> On 8/22/12 2:56 PM, Richard Wallis wrote: >> >>> Hi Karen, >>> >>> I was not ignoring you previous question about where, in Marc terms, data >>> was coming from. I need to talk with someone who was in the core of the >>> processing that produces the data. Unfortunately I am currently being >>> thwarted by vacations. >>> >> Richard, I understand, and apologize if I appeared to be pushing too hard. >> In my own experience, requests for documentation are met with groans, >> especially by folks who'd rather be "doing something useful," like writing >> code. Unfortunately, it really helps to explain what you've done. >> >> I think I've solved the question of where the place of publication comes >> from: 260 $a. The differences between the Web version and the triples >> version is in punctuation. I'm still looking at examples, but it's a slog >> since I'm re-creating records in the triples file with my minimal knowledge >> of "grep" -- a hammer, but the best darned hammer there is. Here are some >> examples: >> >> #1 >> File: >> >> <http://www.worldcat.org/oclc/**43836713<http://www.worldcat.org/oclc/43836713>> >> <http://purl.org/library/**placeOfPublication<http://purl.org/library/placeOfPublication>> >> _:**AX2dX40d4c600X3aX138a12b56f9X3**aXX2dX49b9 >> _:**AX2dX40d4c600X3aX138a12b56f9X3**aXX2dX49b9 <http://schema.org/name> >> "New York" >> >> >> Web: (Using RDFa It Firefox plugin [1]) >> <http://www.worldcat.org/oclc/**43836713<http://www.worldcat.org/oclc/43836713>> >> a schema:Book; >> library:placeOfPublication [ a schema:Place; >> schema:name "New York :"@en ]; >> >> #2 >> >> File: >> _:**A52eb8ca1X3aX138a1313c61X3aXX2**dX7536 <http://schema.org/name> >> "Garden City, N.Y." . >> _:**A52eb8ca1X3aX138a1313c61X3aXX2**dX7536 <http://www.w3.org/1999/02/22-* >> *rdf-syntax-ns#type <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>> < >> http://schema.org/Place> . >> <http://www.worldcat.org/oclc/**524483<http://www.worldcat.org/oclc/524483>> >> <http://purl.org/library/**placeOfPublication<http://purl.org/library/placeOfPublication>> >> _:**A52eb8ca1X3aX138a1313c61X3aXX2**dX7536 . >> >> Web: >> <http://www.worldcat.org/oclc/**524483<http://www.worldcat.org/oclc/524483>> >> a schema:Book; >> library:holdingsCount "803"@en; >> library:oclcnum "524483"@en; >> library:placeOfPublication [ a schema:Place; >> schema:name "Garden City, N.Y.,"@en ]; >> >> Another piece of information is that each instance of a place of >> publication string is given a new identity: >> >> _:**AX2dX44931a01X3aX138a139ed19X3**aXX2dX600d <http://schema.org/name> >> "Garden City, N.Y." . >> _:**AX2dX44a4d9f9X3aX138a132b9d9X3**aXX2dX4efe <http://schema.org/name> >> "Garden City, N.Y." . >> _:**AX2dX44a4d9f9X3aX138a1378e1dX3**aXX2dX1d8d <http://schema.org/name> >> "Garden City, N.Y." . >> _:**AX2dX45b46946X3aX138a139141cX3**aXX2dX7073 <http://schema.org/name> >> "Garden City, N.Y." . >> _:**AX2dX4a6da202X3aX138a1387049X3**aXX2dX7bcc <http://schema.org/name> >> "Garden City, N.Y." . >> _:**AX2dX4b32d4b9X3aX138a1316a9bX3**aXX2dX5f92 <http://schema.org/name> >> "Garden City, N.Y." . >> _:**AX2dX4b5c4da3X3aX138a135d400X3**aXX2dX515e <http://schema.org/name> >> "Garden City, N.Y." . >> _:**AX2dX4b93edacX3aX138a1314f3eX3**aXX2dX58e9 <http://schema.org/name> >> "Garden City, N.Y." . >> _:**AX2dX4c810b47X3aX138a134150cX3**aXX2dX5b77 <http://schema.org/name> >> "Garden City, N.Y." . >> _:**AX2dX4f8be47aX3aX138a12b4eb9X3**aXX2dX1677 <http://schema.org/name> >> "Garden City, N.Y." . >> _:**AX2dX4f8be47aX3aX138a12b4eb9X3**aXX2dX23e1 <http://schema.org/name> >> "Garden City, N.Y." . >> _:**AX2dX52ad903aX3aX138a12d336bX3**aXX2dX5389 <http://schema.org/name> >> "Garden City, N.Y." . >> >> >> Where punctuation doesn't cloud the picture, these could eventually be >> linked to: >> http://id.loc.gov/authorities/**names/n50068040.html<http://id.loc.gov/authorities/names/n50068040.html> >> and: >> http://www.geonames.org/**5118226/garden-city.html<http://www.geonames.org/5118226/garden-city.html> >> >> and in that way could have a shared identity. >> >> kc >> >> p.s. Richard and I are on a list with someone who has loaded the triples >> into a database. I will ask if we can announce it here, and will also try >> to figure out how to use the SPARQL endpoint and provide some examples, if >> that is ok with the dc:creator of the database. >> >> [1] javascript:location.href='http**://www.w3.org/2012/pyRdfa/** >> extract?format=turtle&uri='+**escape(location.href)<http://www.w3.org/2012/pyRdfa/extract?format=turtle&uri='+escape(location.href)> >> >> >>> In the meantime, can you let me have a few examples of where you are >>> seeing >>> discrepancies between the download triples and the RDFa embedded in >>> WorldCat.org pages. >>> >>> ~Richard. >>> >>> On 22 August 2012 19:08, Karen Coyle <[log in to unmask]> wrote: >>> >>> Richard, I've run into yet another area where documentation would be >>>> helpful. There are differences between the schema.org/RDFa that is >>>> embedded in WorldCat data and the exported WorldCat triples in the file. >>>> One of those differences happens to be the source of the place of >>>> publication, if I am reading it right. So, again, a request for >>>> documentation on the fields included and their MARC source. >>>> >>>> Thanks, >>>> >>>> kc >>>> >>>> On 8/17/12 8:38 AM, Richard Wallis wrote: >>>> >>>> In case you missed the press release earlier this week. >>>>> You can now download a significant number of RDF triples describing the >>>>> most highly held 1.2 million resources in WorldCat. Licensed under >>>>> ODC-BY. >>>>> >>>>> I've posted more details on my blog: >>>>> http://dataliberate.com/2012/****08/get-yourself-a-linked-**data-**<http://dataliberate.com/2012/**08/get-yourself-a-linked-data-**> >>>>> piece-of-worldcat-to-play-****with/<http://dataliberate.com/** >>>>> 2012/08/get-yourself-a-linked-**data-piece-of-worldcat-to-**play-with/<http://dataliberate.com/2012/08/get-yourself-a-linked-data-piece-of-worldcat-to-play-with/> >>>>> ~Richard. >>>>> >>>>> -- >>>> Karen Coyle >>>> [log in to unmask] http://kcoyle.net >>>> ph: 1-510-540-7596 >>>> m: 1-510-435-8234 >>>> skype: kcoylenet >>>> >>>> >>> >> -- >> Karen Coyle >> [log in to unmask] http://kcoyle.net >> ph: 1-510-540-7596 >> m: 1-510-435-8234 >> skype: kcoylenet >> > > -- Karen Coyle [log in to unmask] http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet