Ed Summers wrote: > Thanks for posting this Jakob. I was just reading RFC 5005 on the > train yesterday (literally) and the parallels between it and OAI-PMH > struck me as well. It's not quite clear to me how deleted records > would be handled with an atom archive feed. But I guess one could > assume if the identifier is no longer present it has been deleted it. > But that would require pulling the entire archive... I'm not really > sure how much deletes are really used in OAI-PMH repositories anyhow. OAI-PMH 1.1 was not clear enough on deletions but in 2.0 the specification contains an example. I think the missing support of deletions in data providers has to do with the missing explicit support in service providers and vice versa (henn-and-egg-problem). > Stuart Weibel has written [1] about the subject of blog archiving in > the past. And I remember hearing Jon Udell and Dan Chudnov talk about > it [2]. Who knows what technorati, bloglines and googlereader are > doing in this area. I guess the reality is that blogs are on the web > and as such will be archived by InternetArchive [3]. But perhaps that > doesn't really fit quite right? That's my feeling. Thanks. BlogML was new to me - sounds interesting but looks very shaggy and over-engineered - you do not even get the spec in HTML but have to download an archive that contains tons of nasty .NET files and an XML schema instead of a textual description with examples and discussion. I copied the XML schema here: http://www.gbv.de/wikis/cls/BlogML. I think extending ATOM is the better way. > I think your general point is correct. Libraries need to be > integrating themselves into the web these days rather than expecting > the web to integrate into them. I doubt that archiving weblogs is that complicated [1]! You need a harvester (partly implemented in many Feed-Reader), an archive (you could start with just saving validated ATOM-Files), an index (Solr?) and a reader (also already implemented in many Feed-Readers). I bet you don't need more then a medium size project with one or two developers and one or two years to create sustainable tools for basic weblog archiving. Such a project could be done by any larger library or archive that is able to get funding. It's not a lack of resources, it's a lack of visions. > Oh, and would it be alright to add your blog to > http://planet.code4lib.org -- we need more of an international > presence on there IMHO. The subfeed http://jakoblog.de/category/en/feed/atom/ contains all English language postings which are probably of higher relevance. Jakob [1] Ok, real long-term preservatation *is* complicated but if you only archive well-formed XML that conforms to a given schema (ATOM, HTML) you should be in a good position for the next decades. -- Jakob Voß <[log in to unmask]>, skype: nichtich Verbundzentrale des GBV (VZG) / Common Library Network Platz der Goettinger Sieben 1, 37073 Göttingen, Germany +49 (0)551 39-10242, http://www.gbv.de