LISTSERV 16.5 - CODE4LIB Archives

On Wed, Apr 15, 2009 at 00:20, Jonathan Rochkind <[log in to unmask]> wrote:
> Can you show me where this definition of a "URL" vs. a "URI" is made in any RFC or standard-like document?

From http://www.faqs.org/rfcs/rfc3986.html ;

1.1.3.  URI, URL, and URN

   A URI can be further classified as a locator, a name, or both.  The
   term "Uniform Resource Locator" (URL) refers to the subset of URIs
   that, in addition to identifying a resource, provide a means of
   locating the resource by describing its primary access mechanism
   (e.g., its network "location").  The term "Uniform Resource Name"
   (URN) has been used historically to refer to both URIs under the
   "urn" scheme [RFC2141], which are required to remain globally unique
   and persistent even when the resource ceases to exist or becomes
   unavailable, and to any other URI with the properties of a name.

   An individual scheme does not have to be classified as being just one
   of "name" or "locator".  Instances of URIs from any given scheme may
   have the characteristics of names or locators or both, often
   depending on the persistence and care in the assignment of
   identifiers by the naming authority, rather than on any quality of
   the scheme.  Future specifications and related documentation should
   use the general term "URI" rather than the more restrictive terms
   "URL" and "URN" [RFC3305].

As you can see, an URI is an identifier, and a URL is a locator
(mechanism for retrieval), and since a URL is a subset of an URI, you
_can_ resolve URIs as well.

> Sure, we have a _sense_ of how the connotation is different, but
> I don't think that sense is actually formalized anywhere.

It is, and the same stuff is documented in WikiPedia as well ;

   http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
   http://en.wikipedia.org/wiki/Uniform_Resource_Locator

> I think the sem web crowd actually embraces this confusingness,

No, I think they take it at face value; they(the URIs)  are
identifiers for things, and can be used for just that purpose, but
they are also URLs which mean they resolve to something. What I think
you're coming at is that "something" thing it resolves too, as *that*
has no definition. But then, if you go from RDF to Topic Maps PSIs
(PSIs are URIs with an extended meaning), *that* thing it resolves to
indeed has a definition; it's the prose explaining what the identifier
identifies, and this is the most important difference between RDF and
Topic Maps (and a very subtle but important difference, too).

> they want to have it both ways: Oh, a URI doesn't need to resolve,
> it's just an opaque identifier; but you really should use http URIs
> for all URIs; why? because it's important that they resolve.

I smell straw-man. :) But yes, they do want both, as both is in fact a
friggin' smart thing to have. We all deal with identifiers all the
time, in internal as external applications, so why not use an
indetifier scheme that has the added bonus of adding a resolver
mechanism? If you want to be stupid and lock yourself in your limited
world, then using them as just identifiers is fine but perhaps a bit,
well, stupid. But if you want to be smart about it, realizing that
without ontological work there will *never* be proper interop, you use
those identifiers and let them resolve to something. And if you're
really smart, you let them resolve to either more RDF statements, or,
if you're seriously Einsteinly smart, use PSIs (as in Topic Maps) :).

> In general, combining two functions in one mechanism is a
> dangerous and confusing thing to do in data design, in my opinion.

Because ... ?

> By analogy, it's what gets a lot of MARC/AACR2 into trouble.

Hmm, and I thought it was crap design that did that, coupled with poor
metadata constraints and validation channels, untyped fields, poor
tooling, the lack of machine understandability, and the general
library idiom of "not invented here". But correct me if I'm wrong. :)

> Over in: http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-2006-08-17.html

Umm, I'd be wary to take as canon a draft with editorial notes going
back 4 to 5 years that still aren't resolved. In other words, this
document isn't relevant to the real world. Yet.

> They suggest: "URI opacity    'Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource.'"

Well, as a RESTafarian I understand this argument quite well. It's
about not assuming too much from the internal structure of the URI.
Again, it's an identifier, not a scheme such as an URL where structure
is defined. Again, for URIs, don't assume structure because at this
point it isn't an URL.

> If I get a URI representing (eg) a Sudoc (or an ISSN, or an LCCN), I need to
> be able to tell from the URI alone that it IS a Sudoc, AND I need to be able
> to extract the actual SuDoc identifier from it.  That completely violates their
> Opacity requirement

I think you are quite mistaken on this, but before we leap into wheter
the web is suitable for SuDoc I'd rather point out that SuDoc isn't
web friendly in itself, and *that* more than anything stands in the
way of using them with the web. Also, having a unified resolver for
SuDoc isn't hard, can be at a fixed URL, and use a parameter for
identifiers. You don't need to snoop the non-parameterized section of
an URI to get the ID's ;

   http://somewhere.com/sudoc?id=Y%203.C%2076/3:2%20K%2054

It's up to the server-side to deal with that URI and pick out the
parameters as it sees fit, not the creator of the URI (ie. the
client).

> but it's entirely infeasible to require me to make an individual HTTP request
> for every URI I find, to figure out what it IS.

No it's not; if you design your system RESTfully (which, indeed, HTTP
is) then the discovery part can be fast, cached, and using URI
templates embedded in HTTP responses, fully flexible and fit for your
purposes.

> But I just want a darn SuDoc in a URI -- and there are advantages to
> putting a SuDoc in a URI _precisely_ so it can be used in URI-using
> infrastructures like RDF, and these advantages hold _even if_ it's not
> resolvable and we ignore the 'opacity' reccommendation.

   http://somewhere.com/sudoc?id=Y%203.C%2076/3:2%20K%2054

The trick isn't to do it; the trick is to have people agree on how to
do it. The mechanisms are there, ripe for the taking, but you still
need people to agree on how it's done. Off you go, start that
conversation with others that have an interest in SuDoc.


Kind regards,

Alex
-- 
---------------------------------------------------------------------------
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
------------------------------------------ http://shelter.nu/blog/ --------