Print

Print


Hi Athina,

The extractors are very different in terms of what they're optimized to
work with and what they're designed to extract -- you need one designed for
your purposes, and you may need more than one.

  A few years back, I experimented with a number of extractors before
settling on Alchemy as best suited for the purpose at hand -- namely
analyzing transcriptions of interviews to identify topical, place, and
person access points.

A few observations:

   1. There was enormous variation in output from various extractors, and
   effectiveness varied dramatically depending on what we tested them against.
   Some are good at what they're designed to do, and none were any good at
   what they weren't designed to do.  For example, automatic analysis of EAD
   finding aids was uniformly abysmal.

   2. What you think is important and what an extractor thinks is important
   are different. Highly important terms often appear infrequently while
   irrelevant terms often appear repeatedly -- so term recurrence is a poor
   indicator of importance.

   Unless you let the extractor decide what's important, you'll get so much
   output that you'll need a postprocessing routine (e.g. comparison of terms
   against a list or at least pattern matching).

   3. All extractors missed important terms and returned many irrelevant
   ones. However, enough important terms were returned for it to be worthwhile.

Bottom line is that we found it a viable "quick 'n dirty" approach to
generate lists of terms a human could quickly scan, but not good for
anything else we were interested in.

Good luck on your project, and I hope you'll share what you discover.

kyle

On Mon, Sep 16, 2019 at 9:20 AM Athina Livanos-Propst <
[log in to unmask]> wrote:

> Hi everyone,
>
> I'm starting to think around a project that would involve key terms from
> other types of text (transcripts, captions, documents). I'm basically
> trying to build a tool that I can use to extra key terms from larger
> strings of text, i.e. pull out the important words from a larger sentence.
>
> Does anyone have recommendations for tools that already exist out there?
>
> Thanks,
> Athina
>
> PS- I'm thinking specifically about state K-12 educational standards, if
> anyone wants to talk more specifically with me.
>
>
> Athina Livanos-Propst
> Digital Librarian & Editorial Services, Manager | PBS Education
> O: 703-739-5485 | [log in to unmask]<mailto:[log in to unmask]>
>