LISTSERV 16.5 - CODE4LIB Archives

On 10 July 2012 23:13, Karen Coyle <[log in to unmask]> wrote:

> On 7/10/12 2:10 PM, Roy Tennant wrote:
>
>> Uh...what? For the given use case you would be much better off simply
>> using the WorldCat Search API response. Using it only to retrieve an
>> identifier and then going and scraping the Linked Data out of a
>> WorldCat.org page is, at best, redundant.
>>
>
> I do not consider using "linked data" to be "scraping" by any meaning of
> that term.


The tools and code libraries that extract the RDF wrapped in RDFa markup in
html are doing just that - scraping it out of the page markup.  However, as
it is embedded in there in a structured form, so that process can be
considered far more reliable than is normally expected a scraping process
that is easily upset by visual changes.

Machine-actionable data is returned in formats like RDF/XML or ttl or JSON.


From the code and tools that interpret the RDFa in the page, yes.


> And I'm curious that linked data is somehow not considered to be usable as
> "data" and that microformat data is not considered to be searchable


Of course it is usable as data - I think what Roy was getting at is that
you could have satisfied your use case with tools that were available
before the embedding of linked data in to WorldCat detail pages.


> -- in fact, its raison d'etre is search optimization.


Yes one of the reasons for embedding structured data and identifiers, as
well as text [as Google puts it 'things not strings'] is SEO.   I'm sure
that the search engines are already using it for that now.

However, SEO is not the only reason for linked data - [as a linked data
enthusiast] I would suggest that better SEO is a nice side benefit of
something much more powerful.  <evangelism>off</evangelism>


>
>
>> As Richard pointed out, some use cases -- like the one Karen provided
>> -- are not really a good use case for linked data. It's a better use
>> case for an API, which has been available for years.
>>
>
> But is it available to everyone, and is the data retrieved also usable as
> ODC-BY by any member of the Web public?


Yes it is, and at this stage it is only available from within a html page.

This experiment is the first step in a process to make linked data about
WorldCat resources available.  As it will evolve over time other areas such
as API access, content-negotiation, search & other query methods,
additional RDF data vocabularies, etc., etc., will be considered in concert
with community feedback (such as this thread) as to the way forward.

Karen I know you are eager to work with and demonstrate the benefits of
this way of publishing data.  But these things take time and effort, so
please be a little patient, and keep firing off these use cases and issues
they are all valuable input.

~Richard.

>
>
> kc
>
>
>  Roy
>>
>> On Tue, Jul 10, 2012 at 2:08 PM, Kevin Ford <[log in to unmask]> wrote:
>>
>>> The use case clarifies perfectly.
>>>
>>> Totally feasible.  Well, I should say "totally feasible" with the caveat
>>> that I've never used the Worldcat Search API.  Not letting that stop me,
>>> so
>>> long as it is what I imagine it is, then a developer should be able to
>>> perform a search, retrieve the response, and, by integrating one of the
>>> tools advertised on the schema.org website into his/her code, then
>>> retrieve
>>> the microdata for each resource returned from the search (and save it as
>>> RDF
>>> or whatever).
>>>
>>> If someone has created something like this, do speak up.
>>>
>>> Yours,
>>>
>>> Kevin
>>>
>>>
>>>
>>>
>>>
>>> On 07/10/2012 04:48 PM, Karen Coyle wrote:
>>>
>>>> Kevin, if you misunderstand then I undoubtedly haven't been clear (let's
>>>> at least share the confusion :-)). Here's the use case:
>>>>
>>>> PersonA wants to create a comprehensive bibliography of works by
>>>> AuthorB. The goal is to do a search on AuthorB in WorldCat and extract
>>>> the RDFa data from those pages in order to populate the bibliography.
>>>>
>>>> Apart from all of the issues of getting a perfect match on authors and
>>>> of manifestation duplicates (there would need to be editing of the
>>>> results after retrieval at the user's end), how feasible is this? Assume
>>>> that the author is prolific enough that one wouldn't want to look up all
>>>> of the records by hand.
>>>>
>>>> kc
>>>>
>>>> On 7/10/12 1:43 PM, Kevin Ford wrote:
>>>>
>>>>> As for someone who might want to do this programmatically, he/she
>>>>> should take a look at the "Programming languages" section of the
>>>>> second link I sent along:
>>>>>
>>>>> http://schema.rdfs.org/tools.**html<http://schema.rdfs.org/tools.html>
>>>>>
>>>>> There one can find Ruby, Python, and Java extractors and parsers
>>>>> capable of outputting RDF.  A developer can take one of these and
>>>>> programmatically get at the data.
>>>>>
>>>>> Apologies if I am misunderstanding your intent.
>>>>>
>>>>> Yours,
>>>>>
>>>>> Kevin
>>>>>
>>>>>
>>>>>
>>>>> On 07/10/2012 04:34 PM, Karen Coyle wrote:
>>>>>
>>>>>> Thanks, Kevin! And Richard!
>>>>>>
>>>>>> I'm thinking we need a good web site with links to tools. I had
>>>>>> already
>>>>>> been introduced to
>>>>>>
>>>>>> http://www.w3.org/2012/pyRdfa/
>>>>>>
>>>>>> where you can past a URI and get ttl or rdf/xml. These are all good
>>>>>> resources. But what about someone who wants to do this
>>>>>> programmatically,
>>>>>> not through a web site? Richard's message indicates that this isn't
>>>>>> yet
>>>>>> available, so perhaps we should be gathering use cases to support the
>>>>>> need? And have a place to post various solutions, even ones that are
>>>>>> not
>>>>>> OCLC-specific? (Because I am hoping that the use of microformats will
>>>>>> increase in general.)
>>>>>>
>>>>>> kc
>>>>>>
>>>>>>
>>>>>> On 7/10/12 12:12 PM, Kevin Ford wrote:
>>>>>>
>>>>>>> is there an open search to get one to the desired records in the
>>>>>>>>
>>>>>>> first
>>>>>>>
>>>>>>>> place?
>>>>>>>>
>>>>>>> -- I'm not certain this will fully address your question, but try
>>>>>>> these two sites:
>>>>>>>
>>>>>>> Website: http://www.google.com/**webmasters/tools/richsnippets<http://www.google.com/webmasters/tools/richsnippets>
>>>>>>> Example: http://tinyurl.com/dx3h5bg
>>>>>>>
>>>>>>> Website: http://linter.structured-data.**org/<http://linter.structured-data.org/>
>>>>>>> Example: http://tinyurl.com/bmm8bbc
>>>>>>>
>>>>>>> These sites will extract the data, but I don't think you get your
>>>>>>> choice of serialization.  The data are extracted and displayed on the
>>>>>>> resulting page in the HTML, but at least you can *see* the data.
>>>>>>>
>>>>>>> Additionally, there are a number of "tools" to help with microdata
>>>>>>> extraction here:
>>>>>>>
>>>>>>> http://schema.rdfs.org/tools.**html<http://schema.rdfs.org/tools.html>
>>>>>>>
>>>>>>> Some of these will allow you to output specific (RDF) serializations.
>>>>>>>
>>>>>>>
>>>>>>> HTH,
>>>>>>>
>>>>>>> Kevin
>>>>>>>
>>>>>>>
>>>>>>> On 07/10/2012 02:42 PM, Karen Coyle wrote:
>>>>>>>
>>>>>>>> I have demonstrated the schema.org/RDFa microdata in the WC
>>>>>>>> database to
>>>>>>>> various folks and the question always is: how do I get access to
>>>>>>>> this?
>>>>>>>> (The only source I have is the Facebook API, me being a "user"
>>>>>>>> rather
>>>>>>>> than a "maker".) The microdata is CC-BY once you get a Worldcat
>>>>>>>> URI, but
>>>>>>>> is there an open search to get one to the desired records in the
>>>>>>>> first
>>>>>>>> place? I'm poorly-versed in WC APIs so I'm hoping others have a
>>>>>>>> better
>>>>>>>> grasp.
>>>>>>>>
>>>>>>>> @rjw: the OCLC website does a thorough job of hiding email
>>>>>>>> addresses or
>>>>>>>> I would have asked this directly. Then again, a discussion here
>>>>>>>> could
>>>>>>>> have added value.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> kc
>>>>>>>>
>>>>>>>>
> --
> Karen Coyle
> [log in to unmask] http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet
>



-- 
Richard Wallis
Founder, Data Liberate
http://dataliberate.com
Tel: +44 (0)7767 886 005

Linkedin: http://www.linkedin.com/in/richardwallis
Skype: richard.wallis1
Twitter: @rjw
IM: [log in to unmask]