I will be facilitating a bootcamp at ELAG 2018 called "Text mining: Beyond the basics". [0] Below is an outline of the activities:
* What is text mining, and why should I care?
* Creating a corpus
* Creating a plain text version of a corpus with Tika
* Using Voyant Tools to do some “distant” reading
* Using a concordance, like AntConc, to facilitate searching keywords in context
* Creating a simple word list with a text editor
* Cleaning & analyzing word lists with OpenRefine
* Charting & graphing word lists with Tableau Public
* Increasing meaning by extracting parts-of-speech with the Standford POS Tagger
* Increasing meaning by extracting named entities with the Standford NER
* Identifying themes and clustering documents using MALLET
By the end of the workshop you will have increased your ability to:
* identify patterns, anomalies, and trends in a corpus
* practice both “distant” and “scalable” reading
* enhance & complement your ability to do “close” reading
* use & understand any corpus of poetry or prose
The workshop is operating system agnostic, and all the
software is freely available on the ‘Net, or already
installed on your computer. Active participation requires
zero programming, but readers must bring their own computer,
and they must be willing to learn how to use a text editor
such as NotePad++ or BBEdit.
I have also begun to post parts of the bootcamp's workbook on my blog. [1]
'Hope to see you in Prague?
[0] ELAG bootcamp - https://www.elag2018.org/bootcamps/#text_mining
[1] blog - http://infomotions.com/blog/
--
Eric Morgan
|