LISTSERV 16.5 - CODE4LIB Archives

Hi Jon,

To present the other side of the argument so that others on the list can
make an informed decision...

On Thu, Jan 23, 2014 at 4:22 PM, Jon Phipps <[log in to unmask]> wrote:

> I've developed a quite strong opinion that vocabulary developers should not
> _ever_ think that they can understand the semantics of a vocabulary
> resource by 'reading' the URI.


100% Agreed. Good documentation is essential for any ontology, and it has
to be read to understand the semantics. You cannot just look at
oa:hasTarget, out of context, and have any idea what it refers to.

However if that URI is readable it makes developers lives much easier in a
lot of situations, and it has no additional cost. Opaque URIs for
predicates is the digital equivalent of thumbing your nose at the people
you should be courting -- the people who will actually use your ontology in
any practical sense.  It says: We don't care about you enough to make your
life one step easier by having something that's memorable. You will always
have to go back to the ontology every time and reread this documentation,
over and over and over again.

Do you have some expectation that in order
> for the data to be useful your relational or object database identifiers
> must be readable?


Identifiers for objects, no. The table names and field names? Yes. How many
DBAs do you know that create tables with opaque identifiers for the column
names?  How many XML schemas do you know that use opaque identifiers for
the element names?

My count is 0 from many many many instances.  And the reason is the same as
having readable predicate URIs -- so that when you look at the table,
schema, ontology, triple or what have you, there is some mnemonic value
from the name to its intent.


> By whom, and in English? This to me is a frankly colonial
> assumption of the dominance of English in the world of metadata.


In the world of computing in general. "for" "if" "while" ... all English.
While there are turing complete languages out there, the ones that don't
have real world language constructions are toys, like Whitespace for
example.  Even the lolcats programming language is more usable than
whitespace.

Again, it's a cost/value consideration.  There are many people who will
understand English, and when developers program, they're surrounded by it.
If your intended audience is primarily people who speak French, then you
would be entirely justified in using URIs with labels from French. Or
Chinese, though the IRI expansion would be more of a pain :)



> The proper
> understanding of the semantics, although still relatively minimal, is from
> the definition, not the URI.


Yes. Any short cuts to *understanding* rather than *remembering* are to be
avoided.



> Our coining and inclusion of multilingual
> (eventually) lexical URIs based on the label is a concession to developers
> who feel that they can't effectively 'use' the vocabularies unless they can
> read the URIs.


So in my opinion, as is everything in the mail of course, this is even
worse. Now instead of 1600 properties, you have 1600 * (number of languages
+1) properties. And you're going to see them appearing in uses of the
ontology. Either stick with your opaque identifiers or pick a language for
the readable ones, and best practice would be English, but doing both is a
disaster in the making.



>  I grant that writing ad
> hoc sparql queries with opaque URIs can be intensely frustrating, but the
> vocabularies aren't designed specifically to support that incredibly narrow
> use case.


Writing queries is something developers have to do to work with data.  More
importantly, writing code that builds the triples in the first place is
something that developers have to do. And they have to get it right ...
which they likely won't do first time. There will be typos. That P1523235
might be written into the code as P1533235 ... an impossible to spot typo.
 dc:title vs dc:titel ... a bit easier to spot, no?

So the consequence is that the quality of the uses of your ontology will go
down.  If there were 16 fields, maybe there'd be a chance of getting it
right. But 1600, with 5 digit identifiers, is asking for trouble.

Compare MARC fields. We all love our 245$a, I know, but dc:title is a lot
easier to recall. Now imagine those fields are (seemingly) random 5 digit
codes without significant structure. And that there's 1600 of them. And
you're asking the developer to use a graph structure that's likely
unfamiliar to them.

All in my opinion, and all debatable. I hope that your choice goes well for
you, but would like other people to think about it carefully before
following suit.

Rob