LISTSERV 16.5 - CODE4LIB Archives

--- On Fri, 14 Jan 2011, Mike Taylor wrote:
> One reason persistent IDs are better than persistent URLs is that you
> can Google them.  You see this with DOIs: it's true that there is a
> well-known resolution service that you can use for DOIs if you're so
> inclined, but actually a simple web-search for, say, 10.1144/SP343.22
> will get you what you need.  Same for ISBNs.

Couldn't one say "one reason for highly unique IDs is that you can Google
them"?  Highly unique IDs are routinely embedded in URLs such as 
http://dx.doi.org/10.1144/SP343.22 and http://n2t.net/ark:/13030/xt14pw6,
and being in a URL doesn't make them less googleable.

-John

>
> On 14 January 2011 20:29, Kyle Banerjee <[log in to unmask]> wrote:
>>>
>>> This attitude makes sense only if you are used to very bad “persistent
>>>
>>>> URL” systems. A URI is an identifier. Making it persistent is our job.
>>> Using a different identifier scheme won’t make our job easier.
>>>
>> I totally agree with all these statements as well as with the sentiment that
>> the approach I advocate is far from optimal.
>>
>>>
>> My basic philosophy is that: 1) the greatest weakness in any system can be
>> found in the carbon-based liveware it depends on (i.e. people act like
>> people) ; 2) you can totally count on the second law of thermodynamics (the
>> entropy of a closed system always increases); and 3) there is too much work
>> to go around.
>>
>>>
>> Translated for the case at hand, this means: 1) people will inevitably not
>> have enough time to do it right; 2) Data get more complicated and less
>> consistent; 3) The problems aren't going to be fixed. As a result,
>> methods/systems need to be engineered accordingly. This makes our job hard,
>> but that's employment security for us as that's where we contribute value to
>> the equation.
>>
>>
>>> can you give a practical example? I can see embedding an id somewhere in a
>>> digital file, and then creating a link to it as part of the indexing
>>> process, but what about external content that we have no control over... yet
>>> are expected to reference in a consistent way?
>>>
>>
>>
>>> As you observe, reality is messy. With regards to externally referenced
>> content, the options are limited. Ideally, the provider embeds their own
>> identifier either because they just do it, or they were convinced of the
>> value of doing so.
>>
>>>
>> The reason I favor not being too prescriptive of syntax is that identifiers
>> are insanely useful and if you ask people to do anything they don't
>> understand or want to mess with, you'll inevitably find they ignore you
>> because they have too many other things to worry about. For maximum
>> compliance, barriers need to be low as possible.
>>
>>>
>> But to get back to the example, let's suppose they don't provide any kind of
>> identifier no matter how much you bug them. Guess what the resolution
>> service provider's chances are of being informed if they move all the
>> content or even worse, change the system that serves the content?
>>
>>>
>> Has anyone thought through, or put into practice, using Apache mod_rewrite
>>> tables for this simple "redirect one URL to another" use case?
>>>
>>
>> Unless the URLs being directed to can be predicted from the source URLs (an
>> assumption that is only safe in certain types of closed systems), this is
>> just a different type of resolution service that suffers from all the same
>> issues as purls and handles.
>>
>> To summarize this long email into a single sentence, you'll notice the ideas
>> that work the best and prove the most adaptable in the long run are simple
>> and compelling.
>>
>>>
>> kyle
>>
>>
>