As a FYI, as far as I am aware the search engines do not access pages using content negotiation (e.g.. asking for Application/rdf+xml) when looking for structured data such as schema.org in their crawl process. They expect to find it embedded in the HTML as Microdata, RDFa, or increasingly JSON-LD in a script tag. ~Richard On 31 March 2016 at 15:24, Kevin Ford <[log in to unmask]> wrote: > Hi Brian, > > I've tried the wget command and curl and in both cases I just get the HTML > version of the document. I don't think any meaningful content negotiation > is happening. It's probably as Karen suspected: they didn't return and > embed schema in older reviews. Are you getting something else? > > I think the tool Karen is using takes the URL as the identifier (logical) > and converts the '<meta name=description ' tag into schema:description > (which seems fair). That's how the tools comes up with the little bit it > does for this item. > > Yours, > Kevin > > p.s. Curl command I used: > > curl -L -H 'Application/rdf+xml' > http://bmcr.brynmawr.edu/2014/2014-02-18.html | grep schema > > I tried a few variations, such as removing the .html from the end of the > URL etc. Nada. > > > > > On 03/31/2016 08:39 AM, Brian Kennison wrote: > >> >> On Mar 29, 2016, at 12:46 PM, Kevin Ford <[log in to unmask]<mailto: >> [log in to unmask]>> wrote: >> >> FWIW, I'm looking at the HTML itself. You may be using a tool that is >> generating a little but of schema. Is that accurate? >> >> Kevin, >> >> I was perplexed by this also but I realized that there was “content >> negotiation” going on. I set the header to accept rdf and indeed there is >> data for this document. >> >> —Brian >> >> wget --header "Accept: application/rdf+xml" >> http://bmcr.brynmawr.edu/2014/2014-02-18.html >> >>