Thanks for this summary. Enjoying your blog as well.
To those mentioning the need for training sets (true!), the Harvard Open Metadata project seems like a gold mine of a training set. 12 million classified book/library records, MARC format. I've been wanting to try some machine learning experiments on it, but alas, no time)..