On Fri, Jan 24, 2014 at 7:56 AM, Jon Phipps <[log in to unmask]> wrote: > Hi Rob, the conversation continues below... > > On Thu, Jan 23, 2014 at 7:01 PM, Robert Sanderson <[log in to unmask] > >wrote: > > > Hi Jon, > > > > To present the other side of the argument so that others on the list can > > make an informed decision... > > > > Thanks for reminding me that this is an academic panel discussion in front > of an audience, rather than a conversation. > > > > > On Thu, Jan 23, 2014 at 4:22 PM, Jon Phipps <[log in to unmask]> > wrote: > > > > > I've developed a quite strong opinion that vocabulary developers should > > not > > > _ever_ think that they can understand the semantics of a vocabulary > > > resource by 'reading' the URI. > > > > > > 100% Agreed. Good documentation is essential for any ontology, and it has > > to be read to understand the semantics. You cannot just look at > > oa:hasTarget, out of context, and have any idea what it refers to. > > > > However if that URI is readable it makes developers lives much easier in > a > > lot of situations, and it has no additional cost. Opaque URIs for > > predicates is the digital equivalent of thumbing your nose at the people > > you should be courting -- the people who will actually use your ontology > in > > any practical sense. It says: We don't care about you enough to make > your > > life one step easier by having something that's memorable. You will > always > > have to go back to the ontology every time and reread this documentation, > > over and over and over again. > > > > What you suggest is that an identifier (e.g. @azaroth42 or ORCID: > 0000-0003-4441-6852 <https://orcid.org/0000-0003-4441-6852>) should always > be readable as a convenience to the developer. RDA does provide a 'readable > in the language of the reader' uri specifically as a convenience to the > developer. A feature that I lobbied for. It's just not the /canonical/ URI, > because it's an identifier of a property, not the property itself, and that > property is independent of the language used to label it. > > It's the difference between Metadata Management Associates, PO Box 282, > Jacksonville, NY 14854, USA (for people) and 14854-0282 (a perfectly > functional complete address in the USA namespace), which is precisely the > same identifier of that box for machines, and ultimately for the > postmaster, who doesn't care whose name is on the box numbered 282, who > only needs to know that highly memorable name when someone uses the > convenience of not bothering to look up the box number and just sends mail > addressed to us at 14854, or even just Jacksonville. And no I don't want to > start a URL vs. URI/URN/IRI discussion. > > > > > Do you have some expectation that in order > > > for the data to be useful your relational or object database > identifiers > > > must be readable? > > > > > > Identifiers for objects, no. The table names and field names? Yes. How > many > > DBAs do you know that create tables with opaque identifiers for the > column > > names? How many XML schemas do you know that use opaque identifiers for > > the element names? > > > > My count is 0 from many many many instances. And the reason is the same > as > > having readable predicate URIs -- so that when you look at the table, > > schema, ontology, triple or what have you, there is some mnemonic value > > from the name to its intent. > > > > Our experience obviously differs in this regard. I've seen many, many > databases that have relatively opaque column identifiers that were > relabeled in the query to suit the audience for the query. I've seen many > French databases, with French content, intended for a French audience, > designed by French developers, that had French 'column headers'. > > The point here is that the identifiers /identify/ a property that exists > independent of the language of the data being used to describe a resource. > If RDA _had_ to pick a single language to satisfy your requirement for a > single readable identifier, which one? To assume that the one language > should be English says to the non-english speaking world "We don't care > about you enough to make your > life one step easier by having something that's memorable" > > > > > > > By whom, and in English? This to me is a frankly colonial > > > assumption of the dominance of English in the world of metadata. > > > > > > In the world of computing in general. "for" "if" "while" ... all English. > > While there are turing complete languages out there, the ones that don't > > have real world language constructions are toys, like Whitespace for > > example. Even the lolcats programming language is more usable than > > whitespace. > > > > Again, it's a cost/value consideration. There are many people who will > > understand English, and when developers program, they're surrounded by > it. > > If your intended audience is primarily people who speak French, then you > > would be entirely justified in using URIs with labels from French. Or > > Chinese, though the IRI expansion would be more of a pain :) > > > > > > > Despite the fact that developers are surrounded by English I've worked with > many highly skilled developers who didn't speak or read English. Who relied > on documentation and meetings in their own language. What RDA is trying to > convey is the specific bibliographic knowledge, admittedly limited by the > cultural context of the North American and European bibliographic > communities, that can be broken down into classes of things and some > properties of those things. An English URI is often nearly as opaque as a > numeric URI to a non-English-speaking programmer and immediately > communicates an Anglo-American bias. > > RDA's intended audience, as is the case with everything intended to > function in the global web of data, is the entire world in every language. > Identifying a thing using a cultural and language specific word or phrase > instantly biases the general understanding of that thing. And RDA is trying > very hard to avoid that a priori cultural bias as much as possible. > > > > > The proper > > > understanding of the semantics, although still relatively minimal, is > > from > > > the definition, not the URI. > > > > > > Yes. Any short cuts to *understanding* rather than *remembering* are to > be > > avoided. > > > > > > > > > Our coining and inclusion of multilingual > > > (eventually) lexical URIs based on the label is a concession to > > developers > > > who feel that they can't effectively 'use' the vocabularies unless they > > can > > > read the URIs. > > > > > > So in my opinion, as is everything in the mail of course, this is even > > worse. Now instead of 1600 properties, you have 1600 * (number of > languages > > +1) properties. And you're going to see them appearing in uses of the > > ontology. Either stick with your opaque identifiers or pick a language > for > > the readable ones, and best practice would be English, but doing both is > a > > disaster in the making. > > > > > Best practice is not ever English, for the non-English-speaking world. > > > > > > > I grant that writing ad > > > hoc sparql queries with opaque URIs can be intensely frustrating, but > the > > > vocabularies aren't designed specifically to support that incredibly > > narrow > > > use case. > > > > > > Writing queries is something developers have to do to work with data. > More > > importantly, writing code that builds the triples in the first place is > > something that developers have to do. And they have to get it right ... > > which they likely won't do first time. There will be typos. That P1523235 > > might be written into the code as P1533235 ... an impossible to spot > typo. > > dc:title vs dc:titel ... a bit easier to spot, no? > > > > A machine trying to resolve a mis-spelled, non-existent URI is a much > better spell-checker than any developer will ever be. The problem here is > that if RDA truly wants to be multilingual, and avoid the cultural bias of > English identifiers, then they either have to provide multiple lexical > identifiers, or provide a lookup service, like many providers of resources > identified by opaque identifiers. > > > > > > So the consequence is that the quality of the uses of your ontology will > go > > down. If there were 16 fields, maybe there'd be a chance of getting it > > right. But 1600, with 5 digit identifiers, is asking for trouble. > > > > Compare MARC fields. We all love our 245$a, I know, but dc:title is a lot > > easier to recall. Now imagine those fields are (seemingly) random 5 digit > > codes without significant structure. And that there's 1600 of them. And > > you're asking the developer to use a graph structure that's likely > > unfamiliar to them. > > > > Just to clarify: > > You (and others who think like you in the audience) would be fine with: > rdaa:addresseeOf a rdf:Property > owl:sameAs rdaa:P50209 > > but not: > rdaa:P50209 a rdf:Property > owl:sameAs rdaa:addresseeOf > > Which both say precisely the same thing about the same resource. And that > dozens or hundreds of lexical identifiers for the same thing, just to make > life easier for developers is a bad thing. And that best practice would be > to coin a single, readable-in-English URI. > > I'm afraid that I won't ever agree with that perspective, when producing > data for global distribution and consumption. > > I'm personally not entirely happy with hundreds of sameAs lexical URIs. An > alternative would be a lookup service that given a label returned the > canonical URI. But I think that's more of an inconvenience to the developer > than the simple ability to use a memorable URI, based on a label in their > language, and have it resolve (permanently) to a canonical, opaque URI when > accessed by a machine: "Use 'em all, and let the machines figure it out." > > > > All in my opinion, and all debatable. I hope that your choice goes well > for > > you, > > > I'd like to repeat: just because I agree with that choice, and I'm > defending it here, it wasn't my choice. Not at all. And the concerns you > express were well-aired and very carefully considered before the choice was > made. > > > > but would like other people to think about it carefully before > > following suit. > > > > Me too! :-) > > Jon > ...who now has to go deal with the consequences of an ill-considered > decision to deploy an unfamiliar nginx server, on a tight deadline, instead > of my happy buddy Apache >