Hi Rob, the conversation continues below... On Thu, Jan 23, 2014 at 7:01 PM, Robert Sanderson <[log in to unmask]>wrote: > Hi Jon, > > To present the other side of the argument so that others on the list can > make an informed decision... > Thanks for reminding me that this is an academic panel discussion in front of an audience, rather than a conversation. > > On Thu, Jan 23, 2014 at 4:22 PM, Jon Phipps <[log in to unmask]> wrote: > > > I've developed a quite strong opinion that vocabulary developers should > not > > _ever_ think that they can understand the semantics of a vocabulary > > resource by 'reading' the URI. > > > 100% Agreed. Good documentation is essential for any ontology, and it has > to be read to understand the semantics. You cannot just look at > oa:hasTarget, out of context, and have any idea what it refers to. > > However if that URI is readable it makes developers lives much easier in a > lot of situations, and it has no additional cost. Opaque URIs for > predicates is the digital equivalent of thumbing your nose at the people > you should be courting -- the people who will actually use your ontology in > any practical sense. It says: We don't care about you enough to make your > life one step easier by having something that's memorable. You will always > have to go back to the ontology every time and reread this documentation, > over and over and over again. > What you suggest is that an identifier (e.g. @azaroth42 or ORCID: 0000-0003-4441-6852 <https://orcid.org/0000-0003-4441-6852>) should always be readable as a convenience to the developer. RDA does provide a 'readable in the language of the reader' uri specifically as a convenience to the developer. A feature that I lobbied for. It's just not the /canonical/ URI, because it's an identifier of a property, not the property itself, and that property is independent of the language used to label it. It's the difference between Metadata Management Associates, PO Box 282, Jacksonville, NY 14854, USA (for people) and 14854-0282 (a perfectly functional complete address in the USA namespace), which is precisely the same identifier of that box for machines, and ultimately for the postmaster, who doesn't care whose name is on the box numbered 282, who only needs to know that highly memorable name when someone uses the convenience of not bothering to look up the box number and just sends mail addressed to us at 14854, or even just Jacksonville. And no I don't want to start a URL vs. URI/URN/IRI discussion. > > Do you have some expectation that in order > > for the data to be useful your relational or object database identifiers > > must be readable? > > > Identifiers for objects, no. The table names and field names? Yes. How many > DBAs do you know that create tables with opaque identifiers for the column > names? How many XML schemas do you know that use opaque identifiers for > the element names? > > My count is 0 from many many many instances. And the reason is the same as > having readable predicate URIs -- so that when you look at the table, > schema, ontology, triple or what have you, there is some mnemonic value > from the name to its intent. > > Our experience obviously differs in this regard. I've seen many, many databases that have relatively opaque column identifiers that were relabeled in the query to suit the audience for the query. I've seen many French databases, with French content, intended for a French audience, designed by French developers, that had French 'column headers'. The point here is that the identifiers /identify/ a property that exists independent of the language of the data being used to describe a resource. If RDA _had_ to pick a single language to satisfy your requirement for a single readable identifier, which one? To assume that the one language should be English says to the non-english speaking world "We don't care about you enough to make your life one step easier by having something that's memorable" > > > By whom, and in English? This to me is a frankly colonial > > assumption of the dominance of English in the world of metadata. > > > In the world of computing in general. "for" "if" "while" ... all English. > While there are turing complete languages out there, the ones that don't > have real world language constructions are toys, like Whitespace for > example. Even the lolcats programming language is more usable than > whitespace. > > Again, it's a cost/value consideration. There are many people who will > understand English, and when developers program, they're surrounded by it. > If your intended audience is primarily people who speak French, then you > would be entirely justified in using URIs with labels from French. Or > Chinese, though the IRI expansion would be more of a pain :) > > > Despite the fact that developers are surrounded by English I've worked with many highly skilled developers who didn't speak or read English. Who relied on documentation and meetings in their own language. What RDA is trying to convey is the specific bibliographic knowledge, admittedly limited by the cultural context of the North American and European bibliographic communities, that can be broken down into classes of things and some properties of those things. An English URI is often nearly as opaque as a numeric URI to a non-English-speaking programmer and immediately communicates an Anglo-American bias. RDA's intended audience, as is the case with everything intended to function in the global web of data, is the entire world in every language. Identifying a thing using a cultural and language specific word or phrase instantly biases the general understanding of that thing. And RDA is trying very hard to avoid that a priori cultural bias as much as possible. > > The proper > > understanding of the semantics, although still relatively minimal, is > from > > the definition, not the URI. > > > Yes. Any short cuts to *understanding* rather than *remembering* are to be > avoided. > > > > > Our coining and inclusion of multilingual > > (eventually) lexical URIs based on the label is a concession to > developers > > who feel that they can't effectively 'use' the vocabularies unless they > can > > read the URIs. > > > So in my opinion, as is everything in the mail of course, this is even > worse. Now instead of 1600 properties, you have 1600 * (number of languages > +1) properties. And you're going to see them appearing in uses of the > ontology. Either stick with your opaque identifiers or pick a language for > the readable ones, and best practice would be English, but doing both is a > disaster in the making. > > Best practice is not ever English, for the non-English-speaking world. > > > I grant that writing ad > > hoc sparql queries with opaque URIs can be intensely frustrating, but the > > vocabularies aren't designed specifically to support that incredibly > narrow > > use case. > > > Writing queries is something developers have to do to work with data. More > importantly, writing code that builds the triples in the first place is > something that developers have to do. And they have to get it right ... > which they likely won't do first time. There will be typos. That P1523235 > might be written into the code as P1533235 ... an impossible to spot typo. > dc:title vs dc:titel ... a bit easier to spot, no? > A machine trying to resolve a mis-spelled, non-existent URI is a much better spell-checker than any developer will ever be. The problem here is that if RDA truly wants to be multilingual, and avoid the cultural bias of English identifiers, then they either have to provide multiple lexical identifiers, or provide a lookup service, like many providers of resources identified by opaque identifiers. > > So the consequence is that the quality of the uses of your ontology will go > down. If there were 16 fields, maybe there'd be a chance of getting it > right. But 1600, with 5 digit identifiers, is asking for trouble. > Compare MARC fields. We all love our 245$a, I know, but dc:title is a lot > easier to recall. Now imagine those fields are (seemingly) random 5 digit > codes without significant structure. And that there's 1600 of them. And > you're asking the developer to use a graph structure that's likely > unfamiliar to them. > Just to clarify: You (and others who think like you in the audience) would be fine with: rdaa:addresseeOf a rdf:Property owl:sameAs rdaa:P50209 but not: rdaa:P50209 a rdf:Property owl:sameAs rdaa:addresseeOf Which both say precisely the same thing about the same resource. And that dozens or hundreds of lexical identifiers for the same thing, just to make life easier for developers is a bad thing. And that best practice would be to coin a single, readable-in-English URI. I'm afraid that I won't ever agree with that perspective, when producing data for global distribution and consumption. I'm personally not entirely happy with hundreds of sameAs lexical URIs. An alternative would be a lookup service that given a label returned the canonical URI. But I think that's more of an inconvenience to the developer than the simple ability to use a memorable URI, based on a label in their language, and have it resolve (permanently) to a canonical, opaque URI when accessed by a machine: "Use 'em all, and let the machines figure it out." > All in my opinion, and all debatable. I hope that your choice goes well for > you, I'd like to repeat: just because I agree with that choice, and I'm defending it here, it wasn't my choice. Not at all. And the concerns you express were well-aired and very carefully considered before the choice was made. > but would like other people to think about it carefully before > following suit. > Me too! :-) Jon ...who now has to go deal with the consequences of an ill-considered decision to deploy an unfamiliar nginx server, on a tight deadline, instead of my happy buddy Apache