Thanks for the clarification, Sol. You're right: it depends on the checksum algorithm: http://en.wikipedia.org/wiki/Checksum. I'm not sure what algorithm Perl uses as part of unpack('%32C*'), but you're right that POSIX cksum uses a CRC algorithm (http://en.wikipedia.org/wiki/Cyclic_redundancy_check) that is position-dependent. Sol Lederman wrote: > Alex, > > Permuting the characters in a string does not produce the same checksum. If > it did, that would make checksums really weak. I don't know of any checksum > algorithm that produces the same checksum when you merely permute the > characters. > > Here's an example on my iMac. > > echo "The cat chases the dog" > foo1 > echo "The dog chases the cat" > foo2 > cksum foo1 > 414128224 23 foo1 > cksum foo2 > 2453586855 23 foo2 > > Sol > > On Fri, May 28, 2010 at 10:26 AM, Alex Bronstein > <[log in to unmask]>wrote: > > >> Hi Eric, >> >> That's not ideal. checksums generate the same number if the letters in the >> string are moved. For example "The cat chases the dog" and "The dog chases >> the cat" would result in the same checksum. >> >> You'd be better off using md5(): http://perldoc.perl.org/Digest/MD5.html >> >> Something like: >> # If you want a short integer (2 bytes: 0 - 65535) >> my ($integer) = unpack('S', md5($author . $title)); >> >> # If you want a long integer (4 bytes: 0 - 4 billion) >> my ($integer) = unpack('L', md5($author . $title)); >> >> That would give you uniqueness to within the capability of a short or long >> int. If you have few enough items in the list that you're willing to >> increase the odds of non-uniqueness in exchange for a smaller maximum >> number, you can use the % operator as in: >> >> # If you want an integer between 0 and 9999 >> my ($integer) = unpack('S', md5($author . $title)); >> $integer = $integer % 10000; >> >> Alex. >> >> >> Eric Lease Morgan wrote: >> >> >>> Using Perl, how can I convert the author/title combination into some sort >>> >>>> of integer, checksum, or unique value that is the same every time I run my >>>> script? I don't want to have to remember what was used before because I >>>> don't want to maintain a list of previously used keys. Should I use some >>>> form of the pack function? Should I sum the ASCII values of each character >>>> in the author/title combination? >>>> >>>> >>>> >>> Thank you for the prompt replies, and invariably I resolved my own >>> question. Using Perl's unpack function I can generate a checksum based on >>> the concatenation of the authors and titles: >>> >>> my $integer = unpack( "%32C*", "$author$title" ) % 65535; >>> >>> The result is a unique four-digit number that will be consistently >>> generated as my list of author/title combinations grows. At the same time, >>> my solution looks much like an incantation -- with magic. Perl-specific and >>> at a level of computing that is beyond my day-to-day understanding. >>> >>> TGIF >>> >>> >>> >>>