LISTSERV 16.5 - CODE4LIB Archives

Houghton,Andrew writes:
 > > From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
 > > Mike Taylor
 > > Sent: Wednesday, April 01, 2009 9:35 AM
 > > To: [log in to unmask]
 > > Subject: Re: [CODE4LIB] registering info: uris?
 > > 
 > > Houghton,Andrew writes:
 > >  > So creating an info URI for it is meaningless, it's just another
 > >  > alias for the DOI.
 > > 
 > > Not quite.  Embedding a DOI in an info URI (or a URN) means that the
 > > identifier describes its own type.  If you just get the naked string
 > > 	10.1111/j.1475-4983.2007.00728.x
 > > passed to you, say as an rft_id in an OpenURL, then you can't tell
 > > (except by guessing) whether it's a DOI, a SICI, and ISBN or a
 > > biological species identifier.  But if you get
 > > 	info:doi/10.1111/j.1475-4983.2007.00728.x
 > > then you know what you've got, and can act on it accordingly.
 > 
 > Now you are changing the argument to a specific resolution mechanism,
 > e.g., OpenURL.

Not at all.  It applies to any situation where an identifier must
stand alone, without the benefit of additional metadata.

Imagine your web-browser extended by a plugin that knows how to
resolve particularly kinds of info: URLs.  If you just paste the raw
DOI into the URI bar, it won't have a clue what to do with it, but the
wrapped-in-a-URI version stands alone and self-describes, so the
plugin can pull it apart and say, "ah yes, this URI is a DOI, and I
know how my user has configured me to resolve those."

 > OpenURL could have easily defined rft_idType where you specified
 > DOI, SICI, ISBN, etc. along with its actual identifier value in
 > rft_id.

Happily, the committee appreciate all the reasons why it's better for
identifiers to be self-describing.

Note, by the way, that OpenURL 1.0 does provide a bolt-hole for doing
something like what you describe here, using the rft_dat element to
provide a private identifier together with an rfr_id saying what
referer provided the identifier, which is a proxy for saying what the
type of the private identifier is.  Z39.88-2004, Part 1, Section 5.2
gives this example:
	rft_dat = cites/8///citedby/12
	rfr_id = info:sid/elsevier.com:ScienceDirect
As the standard rather optimistically says, "Knowing the identity of
the Referrer might help the Resolver to interpret the Private Data.

It seems clear to me that this mechanism is pretty nasty.  Apart from
anything else, rft_id's, which are self-describing URIs, can repeat --
you can give multiple IDs, each self-contained -- whereas you can't
provide repeating _pairs_ of rft_dat/rfr_id, because the order of
elements may not be defined depending on your transport (e.g. query
parameters in an HTTP GET transport).

 > However, given that OpenURL didn't do this, there is no difference
 > plugging either of the following URIs into rft_id:
 > 
 > http://dx.doi.org/10.1111/j.1475-4983.2007.00728.x
 > info:doi/10.1111/j.1475-4983.2007.00728.x 
 > 
 > when I identify the HTTP URI as a Real World Object.

I think there is a difference at least in connotation.  What you seem
to be suggesting (are you?) is that in the former case, the resolver
should recognise that the HTTP URL matches the regular expression
	^http://dx\.doi\.org\.(.*)$/
and so extract the match and go off and do something else with it.
Notice that in general you do need something like regular expressions
for this -- simple prefix matching won't do it -- as the "actionable
identifier" might be something uglier like:
	http://id.org/resolve?id=10.1111/j.1475-4983.2007.00728.x&type=doi

I think that inviting resolvers to parse HTTP URLs and look for
special cases is not a wise direction to go.

 _/|_	 ___________________________________________________________________
/o ) \/  Mike Taylor    <[log in to unmask]>    http://www.miketaylor.org.uk
)_v__/\  "There is hopeful symbolism in the fact that flags do not wave
	 in a vacuum" -- Arthur C. Clarke, on the moon landings.