LISTSERV 16.5 - CODE4LIB Archives

As a FYI, as far as I am aware the search engines do not access pages using
content negotiation (e.g.. asking for Application/rdf+xml) when looking for
structured data such as schema.org in their crawl process.

They expect to find it embedded in the HTML as Microdata, RDFa, or
increasingly JSON-LD in a script tag.

~Richard


On 31 March 2016 at 15:24, Kevin Ford <[log in to unmask]> wrote:

> Hi Brian,
>
> I've tried the wget command and curl and in both cases I just get the HTML
> version of the document.  I don't think any meaningful content negotiation
> is happening.  It's probably as Karen suspected: they didn't return and
> embed schema in older reviews.  Are you getting something else?
>
> I think the tool Karen is using takes the URL as the identifier (logical)
> and converts the '<meta name=description ' tag into schema:description
> (which seems fair).  That's how the tools comes up with the little bit it
> does for this item.
>
> Yours,
> Kevin
>
> p.s.  Curl command I used:
>
>  curl -L -H 'Application/rdf+xml'
> http://bmcr.brynmawr.edu/2014/2014-02-18.html | grep schema
>
> I tried a few variations, such as removing the .html from the end of the
> URL etc.  Nada.
>
>
>
>
> On 03/31/2016 08:39 AM, Brian Kennison wrote:
>
>>
>> On Mar 29, 2016, at 12:46 PM, Kevin Ford <[log in to unmask]<mailto:
>> [log in to unmask]>> wrote:
>>
>> FWIW, I'm looking at the HTML itself.  You may be using a tool that is
>> generating a little but of schema.  Is that accurate?
>>
>> Kevin,
>>
>> I was perplexed by this also but I realized that there was “content
>> negotiation” going on. I set the header to accept rdf and indeed there is
>> data for this document.
>>
>> —Brian
>>
>> wget --header "Accept: application/rdf+xml"
>> http://bmcr.brynmawr.edu/2014/2014-02-18.html
>>
>>