Hi Owen, all:
This is a very interesting problem.
At Tue, 15 Sep 2009 10:04:09 +0100,
> If we look at a website it is pretty difficult to reference it
> without including the URL - it seems to be the only good way of
> describing what you are actually talking about (how many people
> think of websites by 'title', 'author' and 'publisher'?). For me,
> this leads to an immediate confusion between the description of the
> resource and the route of access to it. So, to differentiate I'm
> starting to think of the http URI in a reference like this as a URI,
> but not necessarily a URL. We then need some mechanism to check,
> given a URI, what is the URL.
> The problem with the approach (as Nate and Eric mention) is that any
> approach that relies on the URI as a identifier (whether using
> OpenURL or a script) is going to have problems as the same URI could
> be used to identify different resources over time. I think Eric's
> suggestion of using additional information to help differentiate is
> worth looking at, but I suspect that this is going to cause us
> problems - although I'd say that it is likely to cause us much less
> work than the alternative, which is allocating every single
> reference to a web resource used in our course material it's own
> persistent URL.
I might be misunderstanding you, but, I think that you are leaving out
the implicit dimension of time here - when was the URL referenced?
What can we use to represent the tuple <URL, date>, and how do we
retrieve an appropriate representation of this tuple? Is the most
appropriate representation the most recent version of the page,
wherever it may have moved? Or is the most appropriate representation
the page as it existed in the past? I would argue that the most
appropriate representation would be the page as it existed in the
past, not what the page looks like now - but I am biased, because I
work in web archiving.
Unfortunately this is a problem that has not been very well addressed
by the web architecture people, or the web archiving people. The web
architecture people start from the assumption that
<http://example.org/> is the same resource which only varies in its
representation as a function of time, not in its identity as a
resource. The web archives people create closed systems and do not
think about how to store and resolve the tuple, <URL, date>.
I know this doesn’t help with your immediate problem, but I think
these are important issues.
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3