Thanks Erik,

Yes - generally references to web sites require a 'route of access' (i.e. URL) and 'date accessed' - because, of course, the content of the website may change over time.

Strictly you are right - if you are going to link to the resource it should be to the version of the page that was available at the time the author accessed it. This time aspect is something I'm thinking about more as a result of the conversations on this thread. The 'date accessed' seems like a good way of differentiating different possible resolutions of a single URL. Unfortunately references don't have a specified format for date, and they can be expressed in a variety of ways - typically you'll see something like 'Accessed 14 September 2009', but as far as I know it could be 'Accessed 14/09/09' or I guess 'Accessed 09/14/09' etc.

It is also true that the intent of a reference can vary - sometimes the intent is to point at a website, and sometimes to point to the content of a website at a moment in time (thinking loosely in FRBR terms I guess you'd say that sometimes you want to reference the work/expression, and sometimes the manifestation? - although I know FRBR gets complicated when you look at digital representations, a whole other discussion)

To be honest, our project is not going to delve into this too much - limited both by time (we finish in February) and practicalities (I just don't think the library/institution is going to want to look at snapshotting websites, or finding archived versions for each course we run - I suspect it would be less effort to update the course to use a more current reference in the cases this problem really manifests itself).

One of the other things I've come to realise is that although it is nice to be able to access material that is referenced, the reference primarily recognises the work of others, and puts your work into context - access is only a secondary concern. It is perfectly possible and OK to reference material that is not generally available, as a reader I may not have access to certain material, and over time material is destroyed so when referencing rare or unique texts it may become absolutely impossible to access the referenced source.

I think for research publications there is a genuine and growing issue - especially when we start to consider the practice of referencing datasets which is just starting to become common practice in scientific research. If the dataset grows over time, will it be possible to see the version of the dataset used when doing a specific piece of research?


Owen Stephens
TELSTAR Project Manager
Library and Learning Resources Centre
The Open University
Walton Hall
Milton Keynes, MK7 6AA

T: +44 (0) 1908 858701
F: +44 (0) 1908 653571
E: [log in to unmask]

> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On
> Behalf Of Erik Hetzner
> Sent: 15 September 2009 18:12
> To: [log in to unmask]
> Subject: Re: [CODE4LIB] Implementing OpenURL for simple web resources
> Hi Owen, all:
> This is a very interesting problem.
> At Tue, 15 Sep 2009 10:04:09 +0100,
> O.Stephens wrote:
> > [...]
> >
> > If we look at a website it is pretty difficult to reference
> it without
> > including the URL - it seems to be the only good way of describing
> > what you are actually talking about (how many people think
> of websites
> > by 'title', 'author' and 'publisher'?). For me, this leads to an
> > immediate confusion between the description of the resource and the
> > route of access to it. So, to differentiate I'm starting to
> think of
> > the http URI in a reference like this as a URI, but not
> necessarily a
> > URL. We then need some mechanism to check, given a URI, what is the
> > URL.
> >
> > [...]
> >
> > The problem with the approach (as Nate and Eric mention) is
> that any
> > approach that relies on the URI as a identifier (whether
> using OpenURL
> > or a script) is going to have problems as the same URI
> could be used
> > to identify different resources over time. I think Eric's
> suggestion
> > of using additional information to help differentiate is
> worth looking
> > at, but I suspect that this is going to cause us problems -
> although
> > I'd say that it is likely to cause us much less work than the
> > alternative, which is allocating every single reference to a web
> > resource used in our course material it's own persistent URL.
> > [...]
> I might be misunderstanding you, but, I think that you are
> leaving out the implicit dimension of time here - when was
> the URL referenced?
> What can we use to represent the tuple <URL, date>, and how
> do we retrieve an appropriate representation of this tuple?
> Is the most appropriate representation the most recent
> version of the page, wherever it may have moved? Or is the
> most appropriate representation the page as it existed in the
> past? I would argue that the most appropriate representation
> would be the page as it existed in the past, not what the
> page looks like now - but I am biased, because I work in web
> archiving.
> Unfortunately this is a problem that has not been very well
> addressed by the web architecture people, or the web
> archiving people. The web architecture people start from the
> assumption that <> is the same resource
> which only varies in its representation as a function of
> time, not in its identity as a resource. The web archives
> people create closed systems and do not think about how to
> store and resolve the tuple, <URL, date>.
> I know this doesn't help with your immediate problem, but I
> think these are important issues.
> best,
> Erik Hetzner
> ;; Erik Hetzner, California Digital Library ;; gnupg key id:
> 1024D/01DB07E3

The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302).