Print

Print


I will be facilitating a bootcamp at ELAG 2018 called "Text mining: Beyond the basics". [0] Below is an outline of the activities:

    * What is text mining, and why should I care?
    * Creating a corpus
    * Creating a plain text version of a corpus with Tika
    * Using Voyant Tools to do some “distant” reading
    * Using a concordance, like AntConc, to facilitate searching keywords in context
    * Creating a simple word list with a text editor
    * Cleaning & analyzing word lists with OpenRefine
    * Charting & graphing word lists with Tableau Public
    * Increasing meaning by extracting parts-of-speech with the Standford POS Tagger
    * Increasing meaning by extracting named entities with the Standford NER
    * Identifying themes and clustering documents using MALLET

  By the end of the workshop you will have increased your ability to:

    * identify patterns, anomalies, and trends in a corpus
    * practice both “distant” and “scalable” reading
    * enhance & complement your ability to do “close” reading
    * use & understand any corpus of poetry or prose

  The workshop is operating system agnostic, and all the
  software is freely available on the ‘Net, or already
  installed on your computer. Active participation requires
  zero programming, but readers must bring their own computer,
  and they must be willing to learn how to use a text editor
  such as NotePad++ or BBEdit.

I have also begun to post parts of the bootcamp's workbook on my blog. [1]

'Hope to see you in Prague?

[0] ELAG bootcamp - https://www.elag2018.org/bootcamps/#text_mining
[1] blog - http://infomotions.com/blog/

--
Eric Morgan