Print

Print


On Jan 27, 2007, at 10:23 AM, Eric Lease Morgan wrote:

> Do y'all know of any open source text summarizers?


Thank you for the prompt replies, and in the end I used a combination
of summarizers:

   1. First I used the Perl module Lingua::EN::Keywords. This works
quite well. Given some text it returns five words it thinks are most
significant.

   2. Second, I used Lingua::EN::Summarizer. Given a text it returns
one or two sentences from a text it thinks are relevant. This does
not work as well as Summarizer #1.

   3. OTS (Open Text Summarizer), like Lingua::EN::Keywords, returns
a list of words it thinks are relevant. So, so.

My real goal was to create a list of tags to associate with full-text
files. To create my tags I:

   1. Got a list of words from Lingua::EN::Keywords.

   2. Added the words from Lingua::EN::Summarizer.

   3. Added the words from OTS.

   4. Added the words from the file's title.

   5. Added the words from the file's author.

   6. Normalized all the words (lowered case, removed punctuation, etc.)

   7. Removed duplicates.

   8. Removed stop words.

Finally, I used Net::Delicious to upload 675 Alex Catalogue of
Electronic Texts links to del.icio.us. The results aren't too bad.
Heck, I certainly couldn't catalog 675 items that quickly. See:

   http://del.icio.us/infomotions/alex

No, it is not perfect, but it is certainly better than nothing!

--
Eric Lease Morgan
University Libraries of Notre Dame