MarkLogic has community and academic editions that do a really good job at co-occurence analysis. We use it for all our XML projects.
On Oct 1, 2012, at 7:55 PM, Bess Sadler <[log in to unmask]<mailto:[log in to unmask]>> wrote:
For a full-text search system we're prototyping, we are being asked to provide term co-occurrence analysis. I'm not very familiar with this concept, so maybe someone on the list can describe it better, but I believe that what is wanted is to be able to query a text corpus for a given word, and to receive in return a list of words that co-occur with the search term, along with some indication of how often those words co-occur. Something like this IBM Many Eyes demo: http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/clint-eastwood-applause-lines-at-r (but we're not necessarily looking for a visualization, just a way to do the query).
Some google searching gives me lots of scholarly articles from computational linguistics and humanities computing, but nothing like "here's a recipe for how to do this in solr" which is what I would really love.
Has anyone done this? How did you approach it? Are there tools you can recommend? Articles or books I should read?
Many thanks in advance,