Print

Print


Hi Erik, all

On Tue, Sep 15, 2009 at 1:12 PM, Erik Hetzner <[log in to unmask]> wrote:
> I might be misunderstanding you, but, I think that you are leaving out
> the implicit dimension of time here - when was the URL referenced?
> What can we use to represent the tuple <URL, date>, and how do we
> retrieve an appropriate representation of this tuple? Is the most
> appropriate representation the most recent version of the page,
> wherever it may have moved? Or is the most appropriate representation
> the page as it existed in the past? I would argue that the most
> appropriate representation would be the page as it existed in the
> past, not what the page looks like now - but I am biased, because I
> work in web archiving.
>
> Unfortunately this is a problem that has not been very well addressed
> by the web architecture people, or the web archiving people. The web
> architecture people start from the assumption that
> <http://example.org/> is the same resource which only varies in its
> representation as a function of time, not in its identity as a
> resource. The web archives people create closed systems and do not
> think about how to store and resolve the tuple, <URL, date>.

I haven't been following this thread completely, but you've taken it
in an interesting direction. I think you've succinctly described the
issue with using URLs as references in an academic context: that the
integrity of the URL is a function of time. As John Kunze has said:
"Just because the URI was the last to see a resource alive doesn't
mean it killed them" :-)

I'm sure you've seen this, but Internet Archive have a nice URL
pattern for referencing a resource representation in time:

  http://web.archive.org/web/{year}{month}{day}{hour}{minute}{seconds}/{url}

So for example you can reference Google's homepage on December 2, 1998
at 23:04:10 with this URL:

  http://web.archive.org/web/19981202230410/http://www.google.com/

As Mike's email points out this is only good as long as Internet
Archive is up and running the way we expect it to. Having any one
organization shoulder this burden isn't particularly scalable, or
realistic IMHO. But luckily the open and distributed nature of the web
allows other organizations to do the same thing--like the great work
you all are doing at the California Digital Library [1] and similar
efforts like WebCite [2]. It would be kinda nice if these web
archiving solutions sported similar URI patterns to enable discovery.
For example it looks like:

  http://webarchives.cdlib.org/sw1jd4pq4k/http://books.nap.edu/html/id_questions/appB.html

references a frame that surrounds an actual representation in time:

  http://webarchives.cdlib.org/wayback.public/NYUL_ag_3/20090320202246/http://books.nap.edu/html/id_questions/appB.html

Which is quite similar to Internet Archive's URI pattern -- not
surprising given the common use of Wayback [3]. But there are some
differences. It might be nice to promote some URI patterns for web
archiving services, so that we could theoretically create applications
that federated search for a known resource at a given time. I guess in
part OpenURL was designed to fill this space, but it might instead be
a bit more natural to define a URI pattern that approximated what
Wayback does, and come up with some way of sharing archive locations.
I'm not sure if that last bit made any sense, or if some attempt at
this has been made already. Maybe something to talk about at iPRES?

I had hoped that the Zotero/InternetArchive collaboration would lead
to some more integration between scholarly use of the web and
archiving [3]. I guess there's still time?

//Ed

[1] http://webarchives.cdlib.org/
[2] http://www.webcitation.org/
[3] http://inkdroid.org/journal/2007/12/17/permalinks-reloaded/