On 24 Feb 2012, at 16:52, Ian Ibbotson wrote:
> Sorry.. late to the discussion...
>
> Isn't this a little apples and oranges?
>
> Surely robots.txt exists because many static resources are served directly
> from a tree structured filesystem?
>
> (Nearly) all OAI requests are responded to by specific service applications
> which are perfectly capable of deciding, on a resource by resource basis if
> an anonymous user should or should not see a given resource. As has been
> said, why would you list a resource in OAI if you didn't think **someone**
> would find it useful. If you want to take something out of circulation, you
> mark it deleted so that clients connecting for updates know it should be
> removed.
>
> OAI isn't about fully enumerating a tree on every visit to see whats new,
> it's about a short and efficient visit to say "What, if anything, has
> changed since I was last here". I don't want to have to walk an entire
> repository of 3 million items to discover item 2999999 was deleted.. I want
> a message to say "Oh, item 2999999 was removed on X".
>
I agree about OAI being an efficient way of harvesting content & finding changes, and perhaps for repositories on the scale of millions of items it would be needed (although if you get to that scale, perhaps other approaches like dumps of data and deltas would be even better?) - however, most Institutional repositories aren't close to this scale (yet?).
I also agree there is a bit of apples and oranges here - they aren't exactly the same thing. However, in some scenarios - and I think really the main ones - the intended outcome seems to be the same. Google Scholar seems to me to be the main point of comparison - this harvests metadata (if correctly embedded in html meta tags) but does it via crawling web pages not OAI-PMH. Because of the advantages of being in Google Scholar (people use it!) repositories support this mechanism anyway - making OAI-PMH an additional overhead. My investigations so far definitely suggest these multiple routes lead to inconsistencies in configuration of different mechanisms.
I don't think my thoughts on it are completely clear either! But OAI-PMH is clearly 'niche' compared to the web, and while niche is sometimes needed, it always makes me slightly jumpy :)
Owen
|