And a word cloud:
http://www.wordle.net/show/wrdl/3157008/code4lib_2011_IRC_logs
On Sat, Feb 12, 2011 at 10:13 AM, Eric Lease Morgan <[log in to unmask]> wrote:
> I have written a few hacks allowing me to do rudimentary text mining
> against the logs. [1] From readme.txt:
>
> This directory contains a number of files and scripts allowing
> one to do a bit of text mining against the Code4Lib conference
> IRC log files for 2011. This is just a beginning, and the
> directory includes:
>
> * irclog.txt - the raw log file downloaded from
> http://irc.code4lib.org/c4l11/static/logs/irclog
>
> * log2db.pl - reads the raw log and outputs a tab-delimited
> file with three columns (date, name, text)
>
> * irclog.db - the output of log2db.pl
>
> * count.pl - outputs the number of names (n), increases (i),
> decreases (d), URLs (u), and commands (c) found in the log;
> useful for seeing what is hot and what is not.
>
> * ngrams.pl - given an integer (n), outputs the most frequent
> n-length phrases; useful to see what words and phrases are
> used most frequently
>
> * concordance.pl - a KWIK index; the simplest of search engines
>
> * readme.txt - this file
>
> Using these tools one can see that:
>
> * Zoia had the most to say
> * mbklein's karma was increased the most
> * Zoia's karma was decreased the most
> * the most popular URL passed around regarded social activities
> * we tried to sing as many as 196 songs closely followed by anagrams
> * 28 of the songs weren't found
> * live streams were mentioned frequently
>
>
> I have to go shovel snow now...
>
> [1] initial hacks - http://bit.ly/gMO4op
>
> --
> Eric Lease Morgan
>
|