Print

Print


On 05/28/2010 09:42 AM, Houghton,Andrew wrote:

>>
>> Using Perl, how can I convert the author/title combination into some
>> sort of integer, checksum, or unique value that is the same every time
>> I run my script? I don't want to have to remember what was used before
>> because I don't want to maintain a list of previously used keys. Should
>> I use some form of the pack function? Should I sum the ASCII values of
>> each character in the author/title combination?
> 
> You could MD5 hash the author/title combination which would give you the
> same hash so long they were the author/title combination was the same,
> e.g., letter case and spelling, etc.  However, that doesn't meet your
> requirement of an small integer, but if you are using the value for a
> Perl hash it might not matter all that much.


If you don't need the one-in-a-gazillion guarantee of uniqueness that you
get from MD5 or SHA, you could resort to good old CRC32.  You may need to
pick up String::CRC32 from CPAN.  Then it's as simple as $checksum =
crc32($string).

BTW, won't a lot of your keys come out as "smith.the" (or in your
examples, "aristotle.on")?  If they key isn't unique, any hash you
calculate from it won't be unique.  You might need to calculate from the
complete title.


-- 
Thomas Dowling
[log in to unmask]