LISTSERV mailing list manager LISTSERV 16.5

Help for CODE4LIB Archives


CODE4LIB Archives

CODE4LIB Archives


CODE4LIB@LISTS.CLIR.ORG


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

CODE4LIB Home

CODE4LIB Home

CODE4LIB  January 2014

CODE4LIB January 2014

Subject:

Re: Fwd: [rules] Publication of the RDA Element Vocabularies

From:

Robert Sanderson <[log in to unmask]>

Reply-To:

Code for Libraries <[log in to unmask]>

Date:

Fri, 24 Jan 2014 09:16:14 -0700

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (263 lines)

(Sorry for a previous empty message)

Hi Jon,

On Fri, Jan 24, 2014 at 7:56 AM, Jon Phipps <[log in to unmask]> wrote:

> Hi Rob, the conversation continues below...
>
> On Thu, Jan 23, 2014 at 7:01 PM, Robert Sanderson <[log in to unmask]
> >wrote:
> > To present the other side of the argument so that others on the list can
> > make an informed decision...
> Thanks for reminding me that this is an academic panel discussion in front
> of an audience, rather than a conversation.
>

Heh :) I just meant that I wasn't trying to convince you to change, just
that I wanted to voice my concerns.
(But, yes, touché!)


> On Thu, Jan 23, 2014 at 4:22 PM, Jon Phipps <[log in to unmask]> wrote:
> >
> > However if that URI is readable it makes developers lives much easier in
> a
> > lot of situations, and it has no additional cost. Opaque URIs for
> > predicates is the digital equivalent of thumbing your nose at the people
> > you should be courting
>
What you suggest is that an identifier (e.g. @azaroth42 or ORCID:
> 0000-0003-4441-6852 <https://orcid.org/0000-0003-4441-6852>) should always
> be readable as a convenience to the developer.


Those are identifiers for objects or entities, not predicates.   As I said,
I'm happy for entities to have opaque URIs.  Where we disagree is that you
can carry over that same rationale to predicates/properties/relationships.


RDA does provide a 'readable
> in the language of the reader' uri specifically as a convenience to the
> developer. A feature that I lobbied for. It's just not the /canonical/ URI,
> because it's an identifier of a property, not the property itself, and that
> property is independent of the language used to label it.
>

So this, IMO, is where the trouble starts.  People /will/ use those
convenience URIs. And that will make for a nightmare in terms of
interoperability (see below).



> It's the difference between Metadata Management Associates, PO Box 282,
> Jacksonville, NY 14854, USA (for people) and 14854-0282 (a perfectly
> functional complete address in the USA namespace), which is precisely the
> same identifier of that box for machines


Which is also an entity, not a predicate. I almost said "property" there,
which would be amusingly incorrect.



> > Do you have some expectation that in order
> > > for the data to be useful your relational or object database
> identifiers
> > > must be readable?
> >
> > Identifiers for objects, no. The table names and field names? Yes. How
> many
> > DBAs do you know that create tables with opaque identifiers for the
> column
> > names?  How many XML schemas do you know that use opaque identifiers for
> > the element names?
> >
> > My count is 0 from many many many instances.  And the reason is the same
> as
> > having readable predicate URIs -- so that when you look at the table,
> > schema, ontology, triple or what have you, there is some mnemonic value
> > from the name to its intent.
> >
> > Our experience obviously differs in this regard. I've seen many, many
> databases that have relatively opaque column identifiers that were
> relabeled in the query to suit the audience for the query. I've seen many
> French databases, with French content, intended for a French audience,
> designed by French developers, that had French 'column headers'.
>

Yes, but French column headers are not opaque. How many schemas have
completely opaque, non-linguistic column headers, element names, etc?
I'm not talking "relatively opaque", I mean "P12345" or similar. I didn't
count MARC in my 0, which is strictly true as it's not XML or a relational
table, but you could say 1 to be fair.

Yes, sometimes they're PrpCtr or similar, but that's at least somewhat
readable (Property Counter, perhaps?) compared to a UUID or random integer.


The point here is that the identifiers /identify/ a property that exists
> independent of the language of the data being used to describe a resource.
> If RDA _had_ to pick a single language to satisfy your requirement for a
> single readable identifier, which one? To assume that the one language
> should be English says to the non-english speaking world "We don't care
> about you enough to make your
> life one step easier by having something that's memorable"
>

My problem is not with the idea that properties exist independently of
language, it's the side effect of not picking a language to use.  If you
had to pick one, then you should pick one.  If you want to make a political
stand, don't pick English. But at least pick one, and only one.

Not caring about the non-English speaking world is at least caring about
some people, rather than no one.  Or the non-French speaking world.


Despite the fact that developers are surrounded by English I've worked with
> many highly skilled developers who didn't speak or read English. Who relied
> on documentation and meetings in their own language.


Likewise, though admittedly primarily European languages rather than Asian.
 However even if someone doesn't speak English (or Italian, or French, or
German), a language-based construct is more memorable than a completely
opaque one.


An English URI is often nearly as opaque as a
> numeric URI to a non-English-speaking programmer and immediately
> communicates an Anglo-American bias.
>

"often nearly"? :)   That sounds almost like you're saying there are times
when a linguistic URI is still okay.



> RDA's intended audience, as is the case with everything intended to
> function in the global web of data, is the entire world in every language.
> Identifying a thing using a cultural and language specific word or phrase
> instantly biases the general understanding of that thing. And RDA is trying
> very hard to avoid that a priori cultural bias as much as possible.
>

Which is admirable, certainly, but ultimately damaging.  A 100% politically
correct but unused vocabulary doesn't really help anyone.


> >  I grant that writing ad
> > > hoc sparql queries with opaque URIs can be intensely frustrating, but
> the
> > > vocabularies aren't designed specifically to support that
> incredibly narrow
> > > use case.
> >
> > Writing queries is something developers have to do to work with data.
>  More
> > importantly, writing code that builds the triples in the first place is
> > something that developers have to do. And they have to get it right ...
> > which they likely won't do first time. There will be typos. That P1523235
> > might be written into the code as P1533235 ... an impossible to spot
> typo.
> >  dc:title vs dc:titel ... a bit easier to spot, no?
>
> A machine trying to resolve a mis-spelled, non-existent URI is a much
> better spell-checker than any developer will ever be.


Non-existent, sure. But the chances are high that there will be collisions
due to typos and you'll be assigning subjects of street addresses.

Combined with # rather than / and you have to parse the response to
determine whether or not the predicate exists. And isn't "Introduction" or
"References".   Secondly, a machine can equally easily determine that
.../title does exist when .../titel does not, so I fail to see how opaque
identifiers are any better.



Just to clarify:
> You (and others who think like you in the audience) would be fine with:
> rdaa:addresseeOf a rdf:Property
>     owl:sameAs rdaa:P50209
> but not:
> rdaa:P50209 a rdf:Property
>     owl:sameAs rdaa:addresseeOf
>

No. Either make your political standpoint and stick with rdaa:P50209, OR
use a memorable URI like rdaa:addresseeOf, but do not do both.




> And that
> dozens or hundreds of lexical identifiers for the same thing, just to make
> life easier for developers is a bad thing. And that best practice would be
> to coin a single, readable-in-English URI.
>

Yes. Or non English, but the rest of the RDF (and computing) world has
picked English.


> I'm afraid that I won't ever agree with that perspective, when producing
> data for global distribution and consumption.
>

And hence my opening line :)



> I'm personally not entirely happy with hundreds of sameAs lexical URIs.


I think you meant hundreds /of thousands/ of, right?
1600 * (number of languages in the world +1) ?


An
> alternative would be a lookup service that given a label returned the
> canonical URI. But I think that's more of an inconvenience to the developer
> than the simple ability to use a memorable URI, based on a label in their
> language, and have it resolve (permanently) to a canonical, opaque URI when
> accessed by a machine: "Use 'em all, and let the machines figure it out."
>

Let the developers write code to have the machines figure it out.  And let
the server at your end deal with sustained lookups all day, every day.  The
W3C has to throttle requests against their DTDs, which is one lookup per
instance.  You're suggesting that EVERY triple require a dereference.

A lookup table, even locally, of hundreds of thousands of URI mappings is
not something anyone wants to deal with. Even, I'll bet, Malay developers
who don't speak English.


> All in my opinion, and all debatable. I hope that your choice goes well
> for
> > you,
>
> I'd like to repeat: just because I agree with that choice, and I'm
> defending it here, it wasn't my choice. Not at all. And the concerns you
> express were well-aired and very carefully considered before the choice was
> made.
>

And yours :)


> but would like other people to think about it carefully before
> > following suit.
> >
> Me too! :-)
> Jon
> ...who now has to go deal with the consequences of an ill-considered
> decision to deploy an unfamiliar nginx server, on a tight deadline, instead
> of my happy buddy Apache
>

Best of luck! :)

Rob

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
September 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
October 2012
September 2012
August 2012
July 2012
June 2012
May 2012
April 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
April 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
August 2008
July 2008
June 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
October 2007
September 2007
August 2007
July 2007
June 2007
May 2007
April 2007
March 2007
February 2007
January 2007
December 2006
November 2006
October 2006
September 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
January 2006
December 2005
November 2005
October 2005
September 2005
August 2005
July 2005
June 2005
May 2005
April 2005
March 2005
February 2005
January 2005
December 2004
November 2004
October 2004
September 2004
August 2004
July 2004
June 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
November 2003

ATOM RSS1 RSS2



LISTS.CLIR.ORG

CataList Email List Search Powered by the LISTSERV Email List Manager