Ah, hadn't heard about that service! I don't think any publishers/databases are likely to use it on their journal article pages so I'm probably safe. But it does have a 'lovely' example of all the annoying punctuation a DOI can legally contain...
Deborah
-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Joe Hourcle
Sent: Wednesday, 22 May 2013 2:03 p.m.
To: [log in to unmask]
Subject: Re: [CODE4LIB] DOI scraping
On May 21, 2013, at 9:40 PM, Fitchett, Deborah wrote:
> Joe and Owen--
>
> Thanks for the ideas!
>
> It's a bit of the opposite goal to LibX, in that rather than having a title/DOI/whatever from some random site and wanting to get to the full-text article, I'm looking at the use case of academics who are already viewing the full-text article and want a link that they can share with students. Even aside from the proxy prefix, the url in their browser may include (or consist entirely of) session gunk.
>
> I'll try a regexp and see how far that gets me. I'm a bit trepidatious about the way the DOI standard allows just about any character imaginable, but at least there's the 10. prefix. Am also considering that if DOIs also appear in the article's bibliography I'll need to make sure the javascript can distinguish between them and the DOI for the article itself; but a lot of this might be 'cross that bridge if I come to it' stuff.
Crap. I just remembered :
http://shortdoi.org/
... I don't know if any publishers are actually using them, or if they're just for people to use on twitter & other social media.
The real problem with them is that they don't have the '10.' string in them.
You can probably get away with just tracking the resolving form of them:
http://doi[.]org/(\w+)
And ignore the
10/(\w+)
form.
-Joe
________________________________
P Please consider the environment before you print this email.
"The contents of this e-mail (including any attachments) may be confidential and/or subject to copyright. Any unauthorised use,
distribution, or copying of the contents is expressly prohibited. If you have received this e-mail in error, please advise the sender
by return e-mail or telephone and then delete this e-mail together with all attachments from your system."
|