Hi Edward,
I have been using two libraries that are based on Rapid Automatic Keyword Extraction (RAKE) algorithm and the Natural Language Toolkit (nltk) to derive keywords. So far, the result have been interesting but less than stellar.
https://pypi.org/project/multi-rake/
https://pypi.org/project/rake-nltk/
What I like about this approach is that it analyzes frequency and co-occurance to return key phrases (not just most frequent keywords) which may better represent the subject of the source text.
Best,
Ian
Ian Matzen
He/Him/His
Systems and Digital Initiatives Librarian
Westfield State University
Westfield, MA 01086-1630
(413) 351 9178
[log in to unmask]<mailto:[log in to unmask]>|westfield.ma.edu<http://westfield.ma.edu/>
[cid:[log in to unmask]]
On Oct 22, 2020, at 2:25 PM, Edward M. Corrado <[log in to unmask]<mailto:[log in to unmask]>> wrote:
Caution External Email: This email originated outside of WSU. Do not click links, open attachments, or respond if it appears to be suspicious.
Hello,
I have a set of just over 60,000 theses and dissertations abstracts that I
want to automatically create keywords/topics from. Does anyone have any
recommendations for text mining or other tools to start with?
Regards,
Edward
|