Alan, if you are looking for data mining software that runs well in Hadoop, I would definitely recommend looking into Apache Mahout [1]. This software is specifically focused on categorization and clustering, and these algorithms tend to work well in the distributed architecture of a Hadoop-based system. If you are looking for parsers, taggers, tokenizers, then a different system (Gate / OpenNLP / UIMA) would be more appropriate. -Aaron [1] http://mahout.apache.org On Aug 27, 2013, at 7:47 PM, Alan Darnell <[log in to unmask]> wrote: > Do any of these work in Hadoop using MapReduce as a programming model? It seems like Hadoop would be a natural use case for text mining and analysis. > > Alan > > On Aug 27, 2013, at 7:44 PM, "Riley, Jenn" <[log in to unmask]> wrote: > >> This is still command-line, but Mallet is heavily used in the DH >> community: http://mallet.cs.umass.edu/. I think MONK >> (http://monkproject.org/) has a UI, but I'm not overly familiar with its >> features. >> >> Jenn >> >> -------------------------------- >> Jenn Riley >> Head, Carolina Digital Library and Archives >> The University of North Carolina at Chapel Hill >> http://cdla.unc.edu/ >> http://www.lib.unc.edu/users/jlriley >> >> [log in to unmask] >> (919) 843-5910 >> >> >> >> >> >> On 8/27/13 11:24 AM, "Eric Lease Morgan" <[log in to unmask]> wrote: >> >>> What sorts of text mining software do y'all support / use in your >>> libraries? >>> >>> We here in the Hesburgh Libraries at the University of Notre Dame have >>> all but opened a place called the Center For Digital Scholarship. We are >>> / will be providing a number of different services to a number of >>> different audiences. These services include but are not necessarily >>> limited exactly to: >>> >>> * data management consultation >>> * data analysis and visualization >>> * geographic information systems support >>> * text mining investigations >>> * referrals to other "centers" across campus >>> >>> I am expected to support the text mining investigations. I have >>> traditionally used open source tools do to my work. Many of these tools >>> require some sort of programming in order to exploit. To some degree I am >>> expected mount text mining software on our local Windows and Macintosh >>> computers here in our Center. I am familiar with the lists of tools >>> available at Bamboo as well as Hermeneuti.ca. [0, 1] TAPoRware is good >>> too, but a bit long in the tooth. [2] >>> >>> Do you know of other sets of tools to choose from? Are you familiar with >>> SASŪ Text Analytics, STATISTICA Data Miner, or RapidMiner? [3, 4, 5] >>> >>> [0] Bamboo Dirt - http://dirt.projectbamboo.org >>> [1] Hermeneuti.ca - http://hermeneuti.ca/voyeur/tools >>> [2] TAPoRware - http://taporware.ualberta.ca >>> [3] Text Analytics - http://www.sas.com/text-analytics/ >>> [4] Data Miner - http://www.statsoft.com/Products/STATISTICA/Data-Miner/ >>> [5] RapidMiner - http://rapid-i.com/content/view/181/190/ >>> >>> -- >>> Eric Lease Morgan, Digital Initiatives Librarian >>> Hesburgh Libraries >>> University of Notre Dame >>> >>> 574/631-8604