LISTSERV 16.5 - CODE4LIB Archives

On May 14, 2013, at 11:56 AM, Donna Campbell <[log in to unmask]> wrote:

> Cambridge Journals encourages new uses for journal data by releasing its API…
> http://journals.cambridge.org/action/stream?pageId=9048&level=2

Speaking of API's for journals, I have been revisiting JSTOR's Data For Research (DFR) site -- http://dfr.jstor.org. It is interesting because it allows you to download data sets describing JSTOR search results. Here's how:

1. go to dfr.jstor.org
2. sign in
3. search the (entire) JSTOR collection
4. refine, refine, and refine your search results
5. request a dataset
6. wait for email message telling you dataset is ready
7. download dataset
8. munge the dataset to do cool things

At the most, each dataset will contain a file of citation information, a lists of ngrams (bigrams, trigrams, and quadgrams) from each article, a list of statistically significant keywords from each article, and a list of most frequently used words from each article.

From this data all sorts of things can be created:

* a tag/word cloud of each article or of the entire corpus
* a Simile timeline of published articles
* various citation formats
* exports into other databases
* after automatically downloading PDF versions of the article,
concordances can be created
* services such as "find more like this one" can be implemented

Unfortunately, the searching API (an SRU interface) has been discontinued from DFR, but the whole thing still is pretty cool.

--
Eric Lease Morgan, Digital Initiatives Librarian
University of Notre Dame

574/631-8604