Through the use of my tool called the Distant Reader, I have refined a process for indexing things like Code4Lib Journal. [1]

The Distant Reader harvests an arbitrary number of user-supplied files or links to files, transforms them into plain text files, and performs numerous natural language processes against them. The result is a large set of indexes that can be used to "read" the given corpus. I have made available the about pages of a number of such indexes:

  * Code4Lib Journal -
     o 1,234,348 words; 303 documents
     o all articles from a journal named Code4Lib Journal

  * Cultural Analytics -
     o 318,287 words; 33 documents
     o all articles from a journal named Cultural Analytics
  * Plato -
     o 929,704 words; 24 documents
     o the complete works of Plato

  * aesthetics -
     o 2,296,890 words; 37 documents
     o books classified as the philosophy of art

At an upcoming high performance computing conference, I -- with a number of colleagues from Indiana University -- will be presenting a poster about the Distant Reader, and we will be taking part in a hack-a-thon. [2, 3] If you too would like hack against the output of the Distant Reader, then drop me a line. 

[1] Distant Reader -
[2] high performance computing conference -
[3] hack-a-thon invitation -

Eric Lease Morgan
Digital Initiatives Librarian, Navari Family Center for Digital Scholarship
Hesburgh Libraries

University of Notre Dame
250E Hesburgh Library
Notre Dame, IN 46556
o: 574-631-8604
e: [log in to unmask]