And a word cloud: http://www.wordle.net/show/wrdl/3157008/code4lib_2011_IRC_logs On Sat, Feb 12, 2011 at 10:13 AM, Eric Lease Morgan <[log in to unmask]> wrote: > I have written a few hacks allowing me to do rudimentary text mining > against the logs. [1] From readme.txt: > > This directory contains a number of files and scripts allowing > one to do a bit of text mining against the Code4Lib conference > IRC log files for 2011. This is just a beginning, and the > directory includes: > > * irclog.txt - the raw log file downloaded from > http://irc.code4lib.org/c4l11/static/logs/irclog > > * log2db.pl - reads the raw log and outputs a tab-delimited > file with three columns (date, name, text) > > * irclog.db - the output of log2db.pl > > * count.pl - outputs the number of names (n), increases (i), > decreases (d), URLs (u), and commands (c) found in the log; > useful for seeing what is hot and what is not. > > * ngrams.pl - given an integer (n), outputs the most frequent > n-length phrases; useful to see what words and phrases are > used most frequently > > * concordance.pl - a KWIK index; the simplest of search engines > > * readme.txt - this file > > Using these tools one can see that: > > * Zoia had the most to say > * mbklein's karma was increased the most > * Zoia's karma was decreased the most > * the most popular URL passed around regarded social activities > * we tried to sing as many as 196 songs closely followed by anagrams > * 28 of the songs weren't found > * live streams were mentioned frequently > > > I have to go shovel snow now... > > [1] initial hacks - http://bit.ly/gMO4op > > -- > Eric Lease Morgan >