Print

Print


At Fri, 18 Sep 2009 10:40:08 -0400,
Ed Summers wrote:
> 
> Hi Erik, all
>
> […]
>
> I haven't been following this thread completely, but you've taken it
> in an interesting direction. I think you've succinctly described the
> issue with using URLs as references in an academic context: that the
> integrity of the URL is a function of time. As John Kunze has said:
> "Just because the URI was the last to see a resource alive doesn't
> mean it killed them" :-)
> 
> I'm sure you've seen this, but Internet Archive have a nice URL
> pattern for referencing a resource representation in time:
> 
>   http://web.archive.org/web/{year}{month}{day}{hour}{minute}{seconds}/{url}
> 
> So for example you can reference Google's homepage on December 2, 1998
> at 23:04:10 with this URL:
> 
>   http://web.archive.org/web/19981202230410/http://www.google.com/
> 
> As Mike's email points out this is only good as long as Internet
> Archive is up and running the way we expect it to. Having any one
> organization shoulder this burden isn't particularly scalable, or
> realistic IMHO. But luckily the open and distributed nature of the
> web allows other organizations to do the same thing--like the great
> work you all are doing at the California Digital Library [1] and
> similar efforts like WebCite [2]. It would be kinda nice if these
> web archiving solutions sported similar URI patterns to enable
> discovery. For example it looks like:
> 
>   http://webarchives.cdlib.org/sw1jd4pq4k/http://books.nap.edu/html/id_questions/appB.html
> 
> references a frame that surrounds an actual representation in time:
> 
>   http://webarchives.cdlib.org/wayback.public/NYUL_ag_3/20090320202246/http://books.nap.edu/html/id_questions/appB.html
> 
> Which is quite similar to Internet Archive's URI pattern -- not
> surprising given the common use of Wayback [3]. But there are some
> differences. It might be nice to promote some URI patterns for web
> archiving services, so that we could theoretically create
> applications that federated search for a known resource at a given
> time. I guess in part OpenURL was designed to fill this space, but
> it might instead be a bit more natural to define a URI pattern that
> approximated what Wayback does, and come up with some way of sharing
> archive locations. I'm not sure if that last bit made any sense, or
> if some attempt at this has been made already. Maybe something to
> talk about at iPRES?
> 
> I had hoped that the Zotero/InternetArchive collaboration would lead
> to some more integration between scholarly use of the web and
> archiving [3]. I guess there's still time?
> 
> //Ed
> 
> [1] http://webarchives.cdlib.org/
> [2] http://www.webcitation.org/
> [3] http://inkdroid.org/journal/2007/12/17/permalinks-reloaded/

Hi Ed, code4libbers -

Sorry for the late reply, but I have been on vacation.

Thanks for the insightful comments. They are very much in line with
things I have been thinking and you have got me thinking along some
other lines as well.

Our system is based on crawls, so in your example sw1jd4pq4k is a
crawl id. We discussed using the .../20090101.../http://.. scheme
directly as in wayback, but decided to use crawl-based URLs as our
primary mechanism of entry, given the constraints of our system.

(By the way, the ...wayback.public... URL should not be relied on
for permanence!)

We would, however, like to support the use of wayback style URLs as
well. There is some interest in the web archiving community of
increasing interoperability between web archive systems, so that we
can, for instance, direct a user to web.archive.org if we do not have
a URL in our system, and vice versa.

In terms of getting authors to cite archived material rather than live
web material, there are many approaches to this that I can think of,
for example:

a) Encouraging authors to link to archive.org or other web archives
rather than the live web;

b) Creating services to allow authors to take snapshots of websites,
like webcite, if necessary;

c) Rewriting links in our system to point to archives, so that, for
instance, the reference (taken from first google search for “mla
website citation”, and, of course, broken):

Lynch, Tim. "DSN Trials and Tribble-ations Review." Psi Phi: Bradley's
Science Fiction Club. 1996. Bradley University. 8 Oct. 1997
<http://www.bradley.edu/campusorg/psiphi/DS9/ep/503r.html>.

would be rewritten to the working URL, based on the URL provided and
the access time (8 Oct. 1997):

<http://web.archive.org/19971008000000/http://www.bradley.edu/campusorg/psiphi/DS9/ep/503r.html>

d) Publicizing web archiving so that uses know that they can use tools
like the web archive to find those broken links.

e) Providing browser plugins so that users who follow 404ed links can
be given the alternative of proceeding to an archived web site.

best,
Erik Hetzner