LISTSERV 16.5 - CODE4LIB Archives

On Fri, Jan 24, 2014 at 7:56 AM, Jon Phipps <[log in to unmask]> wrote:

> Hi Rob, the conversation continues below...
>
> On Thu, Jan 23, 2014 at 7:01 PM, Robert Sanderson <[log in to unmask]
> >wrote:
>
> > Hi Jon,
> >
> > To present the other side of the argument so that others on the list can
> > make an informed decision...
> >
>
> Thanks for reminding me that this is an academic panel discussion in front
> of an audience, rather than a conversation.
>
> >
> > On Thu, Jan 23, 2014 at 4:22 PM, Jon Phipps <[log in to unmask]>
> wrote:
> >
> > > I've developed a quite strong opinion that vocabulary developers should
> > not
> > > _ever_ think that they can understand the semantics of a vocabulary
> > > resource by 'reading' the URI.
> >
> >
> > 100% Agreed. Good documentation is essential for any ontology, and it has
> > to be read to understand the semantics. You cannot just look at
> > oa:hasTarget, out of context, and have any idea what it refers to.
> >
> > However if that URI is readable it makes developers lives much easier in
> a
> > lot of situations, and it has no additional cost. Opaque URIs for
> > predicates is the digital equivalent of thumbing your nose at the people
> > you should be courting -- the people who will actually use your ontology
> in
> > any practical sense.  It says: We don't care about you enough to make
> your
> > life one step easier by having something that's memorable. You will
> always
> > have to go back to the ontology every time and reread this documentation,
> > over and over and over again.
> >
>
> What you suggest is that an identifier (e.g. @azaroth42 or ORCID:
> 0000-0003-4441-6852 <https://orcid.org/0000-0003-4441-6852>) should always
> be readable as a convenience to the developer. RDA does provide a 'readable
> in the language of the reader' uri specifically as a convenience to the
> developer. A feature that I lobbied for. It's just not the /canonical/ URI,
> because it's an identifier of a property, not the property itself, and that
> property is independent of the language used to label it.
>
> It's the difference between Metadata Management Associates, PO Box 282,
> Jacksonville, NY 14854, USA (for people) and 14854-0282 (a perfectly
> functional complete address in the USA namespace), which is precisely the
> same identifier of that box for machines, and ultimately for the
> postmaster, who doesn't care whose name is on the box numbered 282, who
> only needs to know that highly memorable name when someone uses the
> convenience of not bothering to look up the box number and just sends mail
> addressed to us at 14854, or even just Jacksonville. And no I don't want to
> start a URL vs. URI/URN/IRI discussion.
>
> >
> > Do you have some expectation that in order
> > > for the data to be useful your relational or object database
> identifiers
> > > must be readable?
> >
> >
> > Identifiers for objects, no. The table names and field names? Yes. How
> many
> > DBAs do you know that create tables with opaque identifiers for the
> column
> > names?  How many XML schemas do you know that use opaque identifiers for
> > the element names?
> >
> > My count is 0 from many many many instances.  And the reason is the same
> as
> > having readable predicate URIs -- so that when you look at the table,
> > schema, ontology, triple or what have you, there is some mnemonic value
> > from the name to its intent.
> >
> > Our experience obviously differs in this regard. I've seen many, many
> databases that have relatively opaque column identifiers that were
> relabeled in the query to suit the audience for the query. I've seen many
> French databases, with French content, intended for a French audience,
> designed by French developers, that had French 'column headers'.
>
> The point here is that the identifiers /identify/ a property that exists
> independent of the language of the data being used to describe a resource.
> If RDA _had_ to pick a single language to satisfy your requirement for a
> single readable identifier, which one? To assume that the one language
> should be English says to the non-english speaking world "We don't care
> about you enough to make your
> life one step easier by having something that's memorable"
>
>
> >
> > > By whom, and in English? This to me is a frankly colonial
> > > assumption of the dominance of English in the world of metadata.
> >
> >
> > In the world of computing in general. "for" "if" "while" ... all English.
> > While there are turing complete languages out there, the ones that don't
> > have real world language constructions are toys, like Whitespace for
> > example.  Even the lolcats programming language is more usable than
> > whitespace.
> >
> > Again, it's a cost/value consideration.  There are many people who will
> > understand English, and when developers program, they're surrounded by
> it.
> > If your intended audience is primarily people who speak French, then you
> > would be entirely justified in using URIs with labels from French. Or
> > Chinese, though the IRI expansion would be more of a pain :)
> >
> >
> >
> Despite the fact that developers are surrounded by English I've worked with
> many highly skilled developers who didn't speak or read English. Who relied
> on documentation and meetings in their own language. What RDA is trying to
> convey is the specific bibliographic knowledge, admittedly limited by the
> cultural context of the North American and European bibliographic
> communities, that can be broken down into classes of things and some
> properties of those things. An English URI is often nearly as opaque as a
> numeric URI to a non-English-speaking programmer and immediately
> communicates an Anglo-American bias.
>
> RDA's intended audience, as is the case with everything intended to
> function in the global web of data, is the entire world in every language.
> Identifying a thing using a cultural and language specific word or phrase
> instantly biases the general understanding of that thing. And RDA is trying
> very hard to avoid that a priori cultural bias as much as possible.
>
>
> > > The proper
> > > understanding of the semantics, although still relatively minimal, is
> > from
> > > the definition, not the URI.
> >
> >
> > Yes. Any short cuts to *understanding* rather than *remembering* are to
> be
> > avoided.
> >
> >
> >
> > > Our coining and inclusion of multilingual
> > > (eventually) lexical URIs based on the label is a concession to
> > developers
> > > who feel that they can't effectively 'use' the vocabularies unless they
> > can
> > > read the URIs.
> >
> >
> > So in my opinion, as is everything in the mail of course, this is even
> > worse. Now instead of 1600 properties, you have 1600 * (number of
> languages
> > +1) properties. And you're going to see them appearing in uses of the
> > ontology. Either stick with your opaque identifiers or pick a language
> for
> > the readable ones, and best practice would be English, but doing both is
> a
> > disaster in the making.
> >
> >
> Best practice is not ever English, for the non-English-speaking world.
>
>
> >
> > >  I grant that writing ad
> > > hoc sparql queries with opaque URIs can be intensely frustrating, but
> the
> > > vocabularies aren't designed specifically to support that incredibly
> > narrow
> > > use case.
> >
> >
> > Writing queries is something developers have to do to work with data.
>  More
> > importantly, writing code that builds the triples in the first place is
> > something that developers have to do. And they have to get it right ...
> > which they likely won't do first time. There will be typos. That P1523235
> > might be written into the code as P1533235 ... an impossible to spot
> typo.
> >  dc:title vs dc:titel ... a bit easier to spot, no?
> >
>
> A machine trying to resolve a mis-spelled, non-existent URI is a much
> better spell-checker than any developer will ever be. The problem here is
> that if RDA truly wants to be multilingual, and avoid the cultural bias of
> English identifiers, then they either have to provide multiple lexical
> identifiers, or provide a lookup service, like many providers of resources
> identified by opaque identifiers.
>
>
> >
> > So the consequence is that the quality of the uses of your ontology will
> go
> > down.  If there were 16 fields, maybe there'd be a chance of getting it
> > right. But 1600, with 5 digit identifiers, is asking for trouble.
>
>
> > Compare MARC fields. We all love our 245$a, I know, but dc:title is a lot
> > easier to recall. Now imagine those fields are (seemingly) random 5 digit
> > codes without significant structure. And that there's 1600 of them. And
> > you're asking the developer to use a graph structure that's likely
> > unfamiliar to them.
> >
>
> Just to clarify:
>
> You (and others who think like you in the audience) would be fine with:
> rdaa:addresseeOf a rdf:Property
>     owl:sameAs rdaa:P50209
>
> but not:
> rdaa:P50209 a rdf:Property
>     owl:sameAs rdaa:addresseeOf
>
> Which both say precisely the same thing about the same resource. And that
> dozens or hundreds of lexical identifiers for the same thing, just to make
> life easier for developers is a bad thing. And that best practice would be
> to coin a single, readable-in-English URI.
>
> I'm afraid that I won't ever agree with that perspective, when producing
> data for global distribution and consumption.
>
> I'm personally not entirely happy with hundreds of sameAs lexical URIs. An
> alternative would be a lookup service that given a label returned the
> canonical URI. But I think that's more of an inconvenience to the developer
> than the simple ability to use a memorable URI, based on a label in their
> language, and have it resolve (permanently) to a canonical, opaque URI when
> accessed by a machine: "Use 'em all, and let the machines figure it out."
>
>
> > All in my opinion, and all debatable. I hope that your choice goes well
> for
> > you,
>
>
> I'd like to repeat: just because I agree with that choice, and I'm
> defending it here, it wasn't my choice. Not at all. And the concerns you
> express were well-aired and very carefully considered before the choice was
> made.
>
>
> > but would like other people to think about it carefully before
> > following suit.
> >
>
> Me too! :-)
>
> Jon
> ...who now has to go deal with the consequences of an ill-considered
> decision to deploy an unfamiliar nginx server, on a tight deadline, instead
> of my happy buddy Apache
>