I have begun working on a suite of software designed to enable a person to “read” the full text of hundreds (if not a thousand) articles from JSTOR simultaneously, and I call this software the JSTOR Workset Browser. [1]
Using JSTOR’s Data For Research service, it is possible for anybody to first search & browse the totality of JSTOR. [2] The reader is then able to create and download a “dataset” describing found items of interest. This dataset includes a citations.xml file. The Browser takes this citations.xml file as input and then: 1) harvests the content, 2) indexes it, 3) does some analysis against the content, 4) creates a few graphs illustrating characteristics of the dataset, and finally 5) generates a browsable “catalog” in the form of an HTML table. The table includes columns for things like authors, titles, dates as well as page lengths, number of words, and coefficients denoting the use of color words, “big” names, and “great” ideas. In the near future the Browser will support search as well as the generation of a report describing each reader-generated (curated) collection. You can see a number of collections created to date, including writings about Thoreau, Emerson, Dickinson, Longfellow, and Poe. [3]
Combined with similar tools designed to work against the HathiTrust and/or EEBO-TCP, the ultimate goal is to enable students and scholars to easily do research against massive amounts of content quickly and easily. [4, 5]
I’m looking for additional sample content. If you create a dataset from DFR, then send me the citations.xml file, and I will use it as input for the Browser. “Wanna play?”
[1] Browser on GitHub - http://bit.ly/jstor-workset-browser
[2] Data For Research - http://dfr.jstor.org
[3] sample collections - http://dh.crc.nd.edu/sandbox/jstor-workset-browser/
[4] HathiTrust Workset Browser - https://github.com/ericleasemorgan/HTRC-Workset-Browser
[5] EEBO-TCP Workset Browser - https://github.com/ericleasemorgan/EEBO-TCP-Workset-Browser
—
Eric Lease Morgan, Librarian
|