Good call. I ended up rewriting the bookmarklet to look at NYT's <meta>
tags rather than the body text as they are more uniform and usually
contain the info we need. They have fairly consistent use across the
archives for the years I quickly tested (2003-present).
The script now looks for a published title in the meta tags, and if it
doesn't find one, looks for the headline used in the article. I changed
the date search to a fuzzy match by month and year rather than by exact
date, since pubdate can be inconsistent with Lexis's.
Highly unscientific and not 100% accurate, so this bookmarklet's
definitely a bit of a hack, but should work *most* of the time for
articles that made it into the print version.
https://gist.github.com/944809#file_nyt_lexis_bookmarklet_meta_tags.js
--
Erin White
Web Applications Developer, VCU Libraries
804-827-3552 | [log in to unmask] | http://library.vcu.edu/
From:
Bob Duncan <[log in to unmask]>
To:
[log in to unmask]
Date:
04/29/2011 10:42 AM
Subject:
Re: [CODE4LIB] NY Times Bookmarklet
Sent by:
Code for Libraries <[log in to unmask]>
>Date: Wed, 27 Apr 2011 09:10:20 -0400
>From: "Van Mil, James (vanmiljf)" <[log in to unmask]>
>Subject: NY Times Bookmarklet
>. . .
>However, every article at the web version of the NY Times that was
>also published in the print version includes a reference to the
>article from the print edition, including date, page number, and
>print version title (information which is all still accessible in
>the page source when the paywall blocks access).
I wish this were true, but unfortunately, it's not. Not every
reference to the print version includes the print version
headline. In fact, it appears that including the print headline is a
fairly recent addition to the Times Website. (Very unscientific
searching suggests it started within the last few weeks.) I wonder
if it might make more sense to grab the author's name and pass that
with the print pub date to PQ/LexisNexis instead -- most articles
seem to include a byline. Or grab the beginning sentence and pass
that. (You'd have to get rid of any anchor elements.) It also
appears that not every article that's published in print includes a
reference to the print version in the Web version, but most seem to.
Bob Duncan
~!~!~!~!~!~!~!~!~!~!~!~!~
Robert E. Duncan
Systems Librarian
Lafayette College
Easton, PA 18042
[log in to unmask]
http://library.lafayette.edu/
|