Joe and Owen--
Thanks for the ideas!
It's a bit of the opposite goal to LibX, in that rather than having a title/DOI/whatever from some random site and wanting to get to the full-text article, I'm looking at the use case of academics who are already viewing the full-text article and want a link that they can share with students. Even aside from the proxy prefix, the url in their browser may include (or consist entirely of) session gunk.
I'll try a regexp and see how far that gets me. I'm a bit trepidatious about the way the DOI standard allows just about any character imaginable, but at least there's the 10. prefix. Am also considering that if DOIs also appear in the article's bibliography I'll need to make sure the javascript can distinguish between them and the DOI for the article itself; but a lot of this might be 'cross that bridge if I come to it' stuff.
(As may be jQuery... :-) )
Deborah
-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Owen Stephens
Sent: Friday, 17 May 2013 9:01 p.m.
To: [log in to unmask]
Subject: Re: [CODE4LIB] DOI scraping
I'd say yes to the investment in jQuery generally - not too difficult to get the basics if you already use javascript, and makes some things a lot easier
It sounds like you are trying to do something not dissimilar to LibX http://libx.org ? (except via bookmarklet rather than as a browser plugin).
Also looking for custom database scrapers it might be worth looking at Zotero translators, as they already exist for many major sources and I guess will be grabbing the DOI where it exists if they can http://www.zotero.org/support/dev/translators
Owen
Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: [log in to unmask]
Telephone: 0121 288 6936
On 17 May 2013, at 05:32, "Fitchett, Deborah" <[log in to unmask]> wrote:
> Kia ora koutou,
>
> I’m wanting to create a bookmarklet that will let people on a journal article webpage just click the bookmarklet and get a permalink to that article, including our proxy information so it can be accessed off-campus.
>
> Once I’ve got a DOI (or other permalink, but I’ll cross that bridge later), the rest is easy. The trouble is getting the DOI. The options seem to be:
>
> 1. Require the user to locate and manually highlight the DOI on the page. This is very easy to code, not so easy for the user who may not even know what a DOI is let alone how to find it; and some interfaces make it hard to accurately select (I’m looking at you, ScienceDirect).
>
> 2. Live in hope of universal CoiNS implementation. I might be waiting a long time.
>
> 3. Work out, for each database we use, how to scrape the relevant information from the page. Harder/tedious to code, but makes it easy for the user.
>
> I’ve been looking around for existing code that something like #3. So far I’ve found:
>
> · CiteULike’s bookmarklet (jQuery at http://www.citeulike.org/bm - afaik it’s all rights reserved)
>
> · AltMetrics’ bookmarklet (jQuery at http://altmetric-bookmarklet.dsci.it/assets/content.js - MIT licensed)
>
> Can anyone think of anything else I should be looking at for inspiration?
>
> Also on a more general matter: I have the general level of Javascript
> that one gets by poking at things and doing small projects and then
> getting distracted by other things and then coming back some months
> later for a different small project and having to relearn it all over
> again. I’ve long had jQuery on my “I guess I’m going to have to learn
> this someday but, um, today I just wanna stick with what I know” list.
> So is this the kind of thing where it’s going to be quicker to learn
> something about jQuery before I get started, or can I just as easily
> muddle along with my existing limited Javascript? (What really are the
> pros and cons here?)
>
> Nāku noa, nā
>
> Deborah Fitchett
> Digital Access Coordinator
> Library, Teaching and Learning
>
> p +64 3 423 0358
> e
> [log in to unmask]<mailto:[log in to unmask]>
> | w library.lincoln.ac.nz<http://library.lincoln.ac.nz/>
>
> Lincoln University, Te Whare Wānaka o Aoraki New Zealand's specialist
> land-based university
>
>
> ________________________________
> P Please consider the environment before you print this email.
> "The contents of this e-mail (including any attachments) may be
> confidential and/or subject to copyright. Any unauthorised use,
> distribution, or copying of the contents is expressly prohibited. If you have received this e-mail in error, please advise the sender by return e-mail or telephone and then delete this e-mail together with all attachments from your system."
>
"The contents of this e-mail (including any attachments) may be confidential and/or subject to copyright. Any unauthorised use,
distribution, or copying of the contents is expressly prohibited. If you have received this e-mail in error, please advise the sender
by return e-mail or telephone and then delete this e-mail together with all attachments from your system."
|