I submitted a Digital Public Library of America (DPLA) beta-sprint proposal, and it is full of descriptions, illustrations, and demonstrations of how text mining and other text analysis techniques could be applied to library collections. From the Executive Summary:
Use & understand is an evolutionary step in the processes and
functions of a library. These processes and functions enable the
reader to ask and answer questions of large and small sets of
documents relatively easily. Through the use of various text
mining techniques, the reader can grasp quickly the content of
documents, extract some of their meaning, and evaluate them more
thoroughly when compared to the traditional application of
metadata. Some of these processes and functions include:
word/phrase frequency lists, concordances, histograms
illustrating the location of words/phrases in a text, network
diagrams illustrating what author say "in the same breath" when
they mention a given word, plotting publication dates on a
timeline, measuring the weight of a concept in a text, evaluating
texts based on parts-of-speech, supplementing texts with
Wikipedia articles, and plotting place names on a world maps.
http://bit.ly/ojWmzN
For more information about the DPLA, see -- http://bit.ly/irjzqO
--
Eric Lease Morgan
University of Notre Dame
|