Print

Print



On Oct 22, 2020, at 2:25 PM, Edward M. Corrado <[log in to unmask]> wrote:

> I have a set of just over 60,000 theses and dissertations abstracts that I
> want to automatically create keywords/topics from. Does anyone have any
> recommendations for text mining or other tools to start with?


I do this sort of thing on a regular basis, and I use a two Python libraries/modules:

  1. textacy.ke.scake
  2. textacy.ke.yake

Textacy is built on top of another library called "spaCy". 

To use the libraries one:

  1. gets a string
  2. creates a spaCy doc object from the string
  3. applies the scake or yake methods to the object
  4. gets back a keyword (or phrase) plus a score

Attached is a script which takes a file as input and outputs a tab-delimited stream of keywords/phrases.

--
Eric Morgan