The minimum word length and stop word list are run-time configurable.
The exclusion of words that are in more than 50% of the corpus is a
compile-time issue (or simply use boolean). Here are the settings to
be aware of:
On Jun 1, 2009, at 11:13 AM, Mike Taylor wrote:
> However, all of these oddities -- over eager stop-list, ignoring short
> words, not counting words in more than half the rows -- can be sorted
> out by configuration options.