Print

Print


Hi Edward,

I have been using two libraries that are based on Rapid Automatic Keyword Extraction (RAKE) algorithm and the Natural Language Toolkit (nltk) to derive keywords. So far, the result have been interesting but less than stellar.

https://pypi.org/project/multi-rake/
https://pypi.org/project/rake-nltk/

What I like about this approach is that it analyzes frequency and co-occurance to return key phrases (not just most frequent keywords) which may better represent the subject of the source text.

Best,

Ian

Ian Matzen
 He/Him/His
 Systems and Digital Initiatives Librarian
 Westfield State University
 Westfield, MA 01086-1630
 (413) 351 9178
[log in to unmask]<mailto:[log in to unmask]>|westfield.ma.edu<http://westfield.ma.edu/>

[cid:[log in to unmask]]

On Oct 22, 2020, at 2:25 PM, Edward M. Corrado <[log in to unmask]<mailto:[log in to unmask]>> wrote:

Caution External Email: This email originated outside of WSU. Do not click links, open attachments, or respond if it appears to be suspicious.

Hello,

I have a set of just over 60,000 theses and dissertations abstracts that I
want to automatically create keywords/topics from. Does anyone have any
recommendations for text mining or other tools to start with?

Regards,
Edward