Alex,
Permuting the characters in a string does not produce the same checksum. If
it did, that would make checksums really weak. I don't know of any checksum
algorithm that produces the same checksum when you merely permute the
characters.
Here's an example on my iMac.
echo "The cat chases the dog" > foo1
echo "The dog chases the cat" > foo2
cksum foo1
414128224 23 foo1
cksum foo2
2453586855 23 foo2
Sol
On Fri, May 28, 2010 at 10:26 AM, Alex Bronstein
<[log in to unmask]>wrote:
> Hi Eric,
>
> That's not ideal. checksums generate the same number if the letters in the
> string are moved. For example "The cat chases the dog" and "The dog chases
> the cat" would result in the same checksum.
>
> You'd be better off using md5(): http://perldoc.perl.org/Digest/MD5.html
>
> Something like:
> # If you want a short integer (2 bytes: 0 - 65535)
> my ($integer) = unpack('S', md5($author . $title));
>
> # If you want a long integer (4 bytes: 0 - 4 billion)
> my ($integer) = unpack('L', md5($author . $title));
>
> That would give you uniqueness to within the capability of a short or long
> int. If you have few enough items in the list that you're willing to
> increase the odds of non-uniqueness in exchange for a smaller maximum
> number, you can use the % operator as in:
>
> # If you want an integer between 0 and 9999
> my ($integer) = unpack('S', md5($author . $title));
> $integer = $integer % 10000;
>
> Alex.
>
>
> Eric Lease Morgan wrote:
>
>> Using Perl, how can I convert the author/title combination into some sort
>>> of integer, checksum, or unique value that is the same every time I run my
>>> script? I don't want to have to remember what was used before because I
>>> don't want to maintain a list of previously used keys. Should I use some
>>> form of the pack function? Should I sum the ASCII values of each character
>>> in the author/title combination?
>>>
>>>
>>
>>
>> Thank you for the prompt replies, and invariably I resolved my own
>> question. Using Perl's unpack function I can generate a checksum based on
>> the concatenation of the authors and titles:
>>
>> my $integer = unpack( "%32C*", "$author$title" ) % 65535;
>>
>> The result is a unique four-digit number that will be consistently
>> generated as my list of author/title combinations grows. At the same time,
>> my solution looks much like an incantation -- with magic. Perl-specific and
>> at a level of computing that is beyond my day-to-day understanding.
>>
>> TGIF
>>
>>
>>
>
|