LISTSERV 16.5 - CODE4LIB Archives

As the person who created the patches for Bryn Mawr (Perl and Tcl and
SGML--oh my!) back in 2014, their publication process generates static
HTML. Any new pages should include RDFa (schema.org), but the legacy pages
would have needed to be regenerated.

It was fun working with Karen Coyle and Camilla MacKay on figuring out what
schema.org we could deploy to enrich what was largely a set of static
strings... albeit in some instances nicely marked up with SGML elements.

I believe Bryn Mawr was hoping to move to a brand new modern publishing
platform that will incorporate much richer linked open data starting from
the authoring side, so perhaps they chose not to go back and regenerate the
older pages. There were some variations in the SGML and publishing process
over time that I tried to accommodate, but they might have decided it
wasn't worth the risk of republishing (or the time to revalidate all of the
output) in the interim.

On Thu, 31 Mar 2016 at 10:43 Richard Wallis <[log in to unmask]>
wrote:

> As a FYI, as far as I am aware the search engines do not access pages using
> content negotiation (e.g.. asking for Application/rdf+xml) when looking for
> structured data such as schema.org in their crawl process.
>
> They expect to find it embedded in the HTML as Microdata, RDFa, or
> increasingly JSON-LD in a script tag.
>
> ~Richard
>
>
> On 31 March 2016 at 15:24, Kevin Ford <[log in to unmask]> wrote:
>
> > Hi Brian,
> >
> > I've tried the wget command and curl and in both cases I just get the
> HTML
> > version of the document.  I don't think any meaningful content
> negotiation
> > is happening.  It's probably as Karen suspected: they didn't return and
> > embed schema in older reviews.  Are you getting something else?
> >
> > I think the tool Karen is using takes the URL as the identifier (logical)
> > and converts the '<meta name=description ' tag into schema:description
> > (which seems fair).  That's how the tools comes up with the little bit it
> > does for this item.
> >
> > Yours,
> > Kevin
> >
> > p.s.  Curl command I used:
> >
> >  curl -L -H 'Application/rdf+xml'
> > http://bmcr.brynmawr.edu/2014/2014-02-18.html | grep schema
> >
> > I tried a few variations, such as removing the .html from the end of the
> > URL etc.  Nada.
> >
> >
> >
> >
> > On 03/31/2016 08:39 AM, Brian Kennison wrote:
> >
> >>
> >> On Mar 29, 2016, at 12:46 PM, Kevin Ford <[log in to unmask]<mailto:
> >> [log in to unmask]>> wrote:
> >>
> >> FWIW, I'm looking at the HTML itself.  You may be using a tool that is
> >> generating a little but of schema.  Is that accurate?
> >>
> >> Kevin,
> >>
> >> I was perplexed by this also but I realized that there was “content
> >> negotiation” going on. I set the header to accept rdf and indeed there
> is
> >> data for this document.
> >>
> >> —Brian
> >>
> >> wget --header "Accept: application/rdf+xml"
> >> http://bmcr.brynmawr.edu/2014/2014-02-18.html
> >>
> >>
>