LISTSERV 16.5 - CODE4LIB Archives

Dear Yu-lan,

This looks interesting! Too bad, "時不我予" has become my real excuse since my extended family members had serious health problems. I have been wrapping up my unfinished projects with the goal to retire in 2-3 years.

Thanks,
Bie-hwa

-----Original Message-----
From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of Haitz, Lisa (haitzlm)
Sent: Thursday, November 09, 2017 10:45 AM
To: [log in to unmask]
Subject: Re: [CODE4LIB] hands-on workshop on natural language processing & text mining

I love it!

On 11/9/17, 1:13 PM, "Code for Libraries on behalf of Eric Lease Morgan" <[log in to unmask] on behalf of [log in to unmask]> wrote:

    I’m thinking about a hands-on workshop on natural language processing & text mining, below, and your feedback is desired.  —ELM
    
    
    Natural language processing & text mining using freely available tools: "No programming necessary"
    
    This text outlines a hands-on natural language & text mining workshop.
    
    It is possible to do simple & rudimentary natural language processing & text mining with a set of freely available tools. No programming is necessary. This workshop facilitates hands-on exercises demonstrating how this can be done. By participating in this workshop, students & researchers will be able to:
    
     * identify patterns, anomalies, and trends in their texts
     * practice both "distant" and "scalable" reading
     * enhance & complement their ability to do "close" reading
     * use & understand a corpus of poetry or prose at scale
    
    Activities in the workshop include:
    
     * learning what natural language processing is, and why you should care
     * articulating a research question
     * creating a corpus
     * creating a plain text version of a corpus with Tika [1]
     * using Voyant Tools to do some "distant" reading" [2]
     * using a concordance (AntConc) to facilitate searching keywords in context [3]
     * creating a simple word list with a text editor
     * cleaning & analyzing word lists with OpenRefine [4]
     * charting & graphing word lists with Tableau Public [5]
     * increasing meaning by extracting parts-of-speech with the Standford POS Tagger [6]
     * increasing meaning some by extracting named entities with the Standford NER [7]
     * identifying themes and clustering documents using MALLET [8]
    
    Anybody with sets of texts can benefit from this workshop. Any corpus of textual content is apropos: journal articles, books, the complete run of a magazine, blog postings, Tweets, press releases, conference proceedings, websites, poetry, etc. This workshop is computer (Windows, Linux, Macintosh) agnostic. All the software used in this workshop is freely available on the 'Net, or it is already installed on one's computer. Active participation requires zero programming, but students must bring their own computer, and they must not be afraid of their computer's command line interface. 
    
    This workshop will not make participants an expert in natural language processing, but it will empower them to make better sense of large sets of textual information.
    
    [1] Tika - http://tika.apache.org
    [2] Voyant - http://voyant-tools.org
    [3] AntConc - http://www.laurenceanthony.net/software/antconc/
    [4] OpenRefine - http://openrefine.org
    [5] Tableau Public - https://public.tableau.com/
    [6] POS Tagger - https://nlp.stanford.edu/software/tagger.shtml
    [7] NER - https://nlp.stanford.edu/software/CRF-NER.shtml
    [8] MALLET - http://mallet.cs.umass.edu