LISTSERV 16.5 - CODE4LIB Archives

Hi Eric,

That's not ideal. checksums generate the same number if the letters in 
the string are moved. For example "The cat chases the dog" and "The dog 
chases the cat" would result in the same checksum.

You'd be better off using md5(): http://perldoc.perl.org/Digest/MD5.html

Something like:
# If you want a short integer (2 bytes: 0 - 65535)
my ($integer) = unpack('S', md5($author . $title));

# If you want a long integer (4 bytes: 0 - 4 billion)
my ($integer) = unpack('L', md5($author . $title));

That would give you uniqueness to within the capability of a short or 
long int. If you have few enough items in the list that you're willing 
to increase the odds of non-uniqueness in exchange for a smaller maximum 
number, you can use the % operator as in:

# If you want an integer between 0 and 9999
my ($integer) = unpack('S', md5($author . $title));
$integer = $integer % 10000;

Alex.

Eric Lease Morgan wrote:
>> Using Perl, how can I convert the author/title combination into some sort of integer, checksum, or unique value that is the same every time I run my script? I don't want to have to remember what was used before because I don't want to maintain a list of previously used keys. Should I use some form of the pack function? Should I sum the ASCII values of each character in the author/title combination?
>>     
>
>
> Thank you for the prompt replies, and invariably I resolved my own question. Using Perl's unpack function I can generate a checksum based on the concatenation of the authors and titles:
>
>   my $integer = unpack( "%32C*", "$author$title" ) % 65535;
>
> The result is a unique four-digit number that will be consistently generated as my list of author/title combinations grows. At the same time, my solution looks much like an incantation -- with magic. Perl-specific and at a level of computing that is beyond my day-to-day understanding.
>
> TGIF
>
>