I don't have anything for you, but I wanted to say that the project sounds severely cool! Best regards, *Jason Bengtson, MLIS, MA* Innovation Architect *Houston Academy of MedicineThe Texas Medical Center Library* 1133 John Freeman Blvd Houston, TX 77030 http://library.tmc.edu/ www.jasonbengtson.com On Sat, Jul 11, 2015 at 11:06 AM, Eric Lease Morgan <[log in to unmask]> wrote: > I have begun working on a suite of software designed to enable a person to > “read” the full text of hundreds (if not a thousand) articles from JSTOR > simultaneously, and I call this software the JSTOR Workset Browser. [1] > > Using JSTOR’s Data For Research service, it is possible for anybody to > first search & browse the totality of JSTOR. [2] The reader is then able to > create and download a “dataset” describing found items of interest. This > dataset includes a citations.xml file. The Browser takes this citations.xml > file as input and then: 1) harvests the content, 2) indexes it, 3) does > some analysis against the content, 4) creates a few graphs illustrating > characteristics of the dataset, and finally 5) generates a browsable > “catalog” in the form of an HTML table. The table includes columns for > things like authors, titles, dates as well as page lengths, number of > words, and coefficients denoting the use of color words, “big” names, and > “great” ideas. In the near future the Browser will support search as well > as the generation of a report describing each reader-generated (curated) > collection. You can see a number of collections created to date, including > writings about Thoreau, E! > merson, Dickinson, Longfellow, and Poe. [3] > > Combined with similar tools designed to work against the HathiTrust and/or > EEBO-TCP, the ultimate goal is to enable students and scholars to easily do > research against massive amounts of content quickly and easily. [4, 5] > > I’m looking for additional sample content. If you create a dataset from > DFR, then send me the citations.xml file, and I will use it as input for > the Browser. “Wanna play?” > > > [1] Browser on GitHub - http://bit.ly/jstor-workset-browser > [2] Data For Research - http://dfr.jstor.org > [3] sample collections - > http://dh.crc.nd.edu/sandbox/jstor-workset-browser/ > [4] HathiTrust Workset Browser - > https://github.com/ericleasemorgan/HTRC-Workset-Browser > [5] EEBO-TCP Workset Browser - > https://github.com/ericleasemorgan/EEBO-TCP-Workset-Browser > > > — > Eric Lease Morgan, Librarian >