Print

Print


Ed Summers wrote:

> Thanks for posting this Jakob. I was just reading RFC 5005 on the
> train yesterday (literally) and the parallels between it and OAI-PMH
> struck me as well. It's not quite clear to me how deleted records
> would be handled with an atom archive feed. But I guess one could
> assume if the identifier is no longer present it has been deleted it.
> But that would require pulling the entire archive... I'm not really
> sure how much deletes are really used in OAI-PMH repositories anyhow.

OAI-PMH 1.1 was not clear enough on deletions but in 2.0 the
specification contains an example. I think the missing support of
deletions in data providers has to do with the missing explicit support
in service providers and vice versa (henn-and-egg-problem).

> Stuart Weibel has written [1] about the subject of blog archiving in
> the past. And I remember hearing Jon Udell and Dan Chudnov talk about
> it [2]. Who knows what technorati, bloglines and googlereader are
> doing in this area. I guess the reality is that blogs are on the web
> and as such will be archived by InternetArchive [3]. But perhaps that
> doesn't really fit quite right? That's my feeling.

Thanks. BlogML was new to me - sounds interesting but looks very shaggy
and over-engineered - you do not even get the spec in HTML but have to
download an archive that contains tons of nasty .NET files and an XML
schema instead of a textual description with examples and discussion. I
copied the XML schema here: http://www.gbv.de/wikis/cls/BlogML. I think
extending ATOM is the better way.

> I think your general point is correct. Libraries need to be
> integrating themselves into the web these days rather than expecting
> the web to integrate into them.

I doubt that archiving weblogs is that complicated [1]! You need a
harvester (partly implemented in many Feed-Reader), an archive (you
could start with just saving validated ATOM-Files), an index (Solr?) and
a reader (also already implemented in many Feed-Readers). I bet you
don't need more then a medium size project with one or two developers
and one or two years to create sustainable tools for basic weblog
archiving. Such a project could be done by any larger library or archive
that is able to get funding. It's not a lack of resources, it's a lack
of visions.

> Oh, and would it be alright to add your blog to
> http://planet.code4lib.org -- we need more of an international
> presence on there IMHO.

The subfeed http://jakoblog.de/category/en/feed/atom/ contains all
English language postings which are probably of higher relevance.

Jakob

[1] Ok, real long-term preservatation *is* complicated but if you only
archive well-formed XML that conforms to a given schema (ATOM, HTML) you
should be in a good position for the next decades.

--
Jakob Voß <[log in to unmask]>, skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de