LISTSERV 16.5 - CODE4LIB Archives

On Fri, Apr 8, 2016 at 8:13 AM, Jenn C <[log in to unmask]> wrote:

> I worked on a text mining project last semester where I had a bunch of
> magazines with text that was totally unstructured (from IA). I would have
> really liked to know how to work entity matching into such a project. Are
> there text mining projects out there that demonstrate doing this?
>

What did you use for entity identification? My gut reaction would be to
look at what the entity extractor pulled out and then normalize the source
in the hopes of improving the accuracy. Even when controlled vocab is not
used, normalizing data makes a massive difference.

 I am curious as to what the data for the Panama Papers looked like going
in. I would think significant normalization and structuring would be
necessary to leverage the advantages of using Blacklight over other tools.

kyle