It's been a while since I perled, so this might not be the most
idiomatic solution, but you could stem the entire words has list once
and create a hash of all the sums (%words_stems), then run the list of
idea words (@ideas), checking only the desired stems:
use strict;
use Lingua::Stem::Snowball;
my @ideas = ('books', 'otters', 'library');
my %words = ( 'books' => 5,
'library' => 6,
'librarianship' => 5,
'librarians' => 3,
'librarian' => 3,
'book' => 3,
'museums' => 2
);
my %words_stems = {};
my $stemmer = Lingua::Stem::Snowball->new( lang => 'en' );
foreach my $word (keys %words)
{
$words_stems{$stemmer->stem($word)} += $words{$word};
}
foreach my $idea (@ideas)
{
my $idea_stem = $stemmer->stem( $idea );
print "$idea ($idea_stem)\n";
print $words_stems{$idea_stem}."\n";
}
The first foreach loop is executed once per word in %words, while the
second foreach loop gets run once per item in @ideas. So 150,000 words
with 1,000 ideas would only call the stem function (which is
presumably where all the cost is) only 150,000 times.
If you plan on doing something similar later, you could save that hash
to disk, btw.
Ben
--
Benjamin Florin
Technology Assistant for Blended Education
Simmons College GSLIS
617-521-2842
[log in to unmask]
|