Hi Erik, all On Tue, Sep 15, 2009 at 1:12 PM, Erik Hetzner <[log in to unmask]> wrote: > I might be misunderstanding you, but, I think that you are leaving out > the implicit dimension of time here - when was the URL referenced? > What can we use to represent the tuple <URL, date>, and how do we > retrieve an appropriate representation of this tuple? Is the most > appropriate representation the most recent version of the page, > wherever it may have moved? Or is the most appropriate representation > the page as it existed in the past? I would argue that the most > appropriate representation would be the page as it existed in the > past, not what the page looks like now - but I am biased, because I > work in web archiving. > > Unfortunately this is a problem that has not been very well addressed > by the web architecture people, or the web archiving people. The web > architecture people start from the assumption that > <http://example.org/> is the same resource which only varies in its > representation as a function of time, not in its identity as a > resource. The web archives people create closed systems and do not > think about how to store and resolve the tuple, <URL, date>. I haven't been following this thread completely, but you've taken it in an interesting direction. I think you've succinctly described the issue with using URLs as references in an academic context: that the integrity of the URL is a function of time. As John Kunze has said: "Just because the URI was the last to see a resource alive doesn't mean it killed them" :-) I'm sure you've seen this, but Internet Archive have a nice URL pattern for referencing a resource representation in time: http://web.archive.org/web/{year}{month}{day}{hour}{minute}{seconds}/{url} So for example you can reference Google's homepage on December 2, 1998 at 23:04:10 with this URL: http://web.archive.org/web/19981202230410/http://www.google.com/ As Mike's email points out this is only good as long as Internet Archive is up and running the way we expect it to. Having any one organization shoulder this burden isn't particularly scalable, or realistic IMHO. But luckily the open and distributed nature of the web allows other organizations to do the same thing--like the great work you all are doing at the California Digital Library [1] and similar efforts like WebCite [2]. It would be kinda nice if these web archiving solutions sported similar URI patterns to enable discovery. For example it looks like: http://webarchives.cdlib.org/sw1jd4pq4k/http://books.nap.edu/html/id_questions/appB.html references a frame that surrounds an actual representation in time: http://webarchives.cdlib.org/wayback.public/NYUL_ag_3/20090320202246/http://books.nap.edu/html/id_questions/appB.html Which is quite similar to Internet Archive's URI pattern -- not surprising given the common use of Wayback [3]. But there are some differences. It might be nice to promote some URI patterns for web archiving services, so that we could theoretically create applications that federated search for a known resource at a given time. I guess in part OpenURL was designed to fill this space, but it might instead be a bit more natural to define a URI pattern that approximated what Wayback does, and come up with some way of sharing archive locations. I'm not sure if that last bit made any sense, or if some attempt at this has been made already. Maybe something to talk about at iPRES? I had hoped that the Zotero/InternetArchive collaboration would lead to some more integration between scholarly use of the web and archiving [3]. I guess there's still time? //Ed [1] http://webarchives.cdlib.org/ [2] http://www.webcitation.org/ [3] http://inkdroid.org/journal/2007/12/17/permalinks-reloaded/