Since you maybe looking at Drupal intergratin down the path, I would look
at using python znd the NLTK , and develop a web service that coild ghen be
used by drupal
On 01/07/2014 11:13 PM, "Katie" <[log in to unmask]> wrote:
> Hello,
>
> Has anyone here experience in the world of natural language programming
> (while applying information retrieval techniques)?
>
> I'm currently trying to develop a tool that will:
>
> 1. take a pdf and extract the text (paying no attention to images or
> formatting)
> 2. analyze the text via term weighting, inverse document frequency, and
> other natural language processing techniques
> 3. assemble a list of suggested terms and concepts that are weighted
> heavily in that document
>
> Step 1 is straightforward and I've had much success there. Step 2 is the
> problem child. I've played around with a few APIs (like AlchemyAPI) but
> they have character length limitations or other shortcomings that keep me
> looking.
>
> The background behind this project is that I work for a digital library
> with a large pre-existing collection of pdfs with rudimentary metadata. The
> aforementioned tool will be used to classify and group the pdfs according
> to the themes of the library. Our CMS is Drupal so depending on my level of
> ambition, this *might* develop into a module.
>
> Does this sound like a project that has been done/attempted before? Any
> suggested tools or reading materials?
>
|