Print

Print


For a full-text search system we're prototyping, we are being asked to provide term co-occurrence analysis. I'm not very familiar with this concept, so maybe someone on the list can describe it better, but I believe that what is wanted is to be able to query a text corpus for a given word, and to receive in return a list of words that co-occur with the search term, along with some indication of how often those words co-occur. Something like this IBM Many Eyes demo: http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/clint-eastwood-applause-lines-at-r (but we're not necessarily looking for a visualization, just a way to do the query). 

Some google searching gives me lots of scholarly articles from computational linguistics and humanities computing, but nothing like "here's a recipe for how to do this in solr" which is what I would really love. 

Has anyone done this? How did you approach it? Are there tools you can recommend? Articles or books I should read? 

Many thanks in advance, 
Bess