Print

Print


Presumably the call to stem() is the expensive part of your loop, so I'd
want to cut that out if that is true. It looks to me that you can pass in an
array reference to stem(), so there's no need for calling stem() in a loop
at all.   I'd think something like the code below should help reduce your
calls to stem() to one call for the the idea and one call for the list of
words. Note I used a sorted set of keys in order to assure that I keep the
counts and the words that are stemmed in the same order when adding up the
totals.  The sort could be expensive too, so this may not work out better
for you, depending on your input data and the performance of sort() and
stem(). You could also use stem_in_place() if you don't want to make a copy
of the array.  Changing to use an array of @ideas instead of the scalar
$idea would use an analogous technique.

Matt

use strict;
use Lingua::Stem::Snowball;
my $idea  = 'books';
my %words = ( 'books'        => 5,
             'library'       => 6,
             'librarianship' => 5,
             'librarians'    => 3,
             'librarian'     => 3,
             'book'          => 3,
             'museums'       => 2
           );
my $stemmer   = Lingua::Stem::Snowball->new( lang => 'en' );
my $idea_stem = $stemmer->stem( $idea );
print "$idea ($idea_stem)\n";
my @wordkeys = sort(keys(%words));
my @stemwords = $stemmer->stem( \@wordkeys );
my $i = 0;
my $total = 0;
foreach my $word (@wordkeys) {
    if ( $idea_stem eq $stemwords[$i] ) { $total += $words{ $word } }
    $i++;
}
print "$total\n";