On Mon, Dec 12, 2011 at 10:56 AM, Michael B. Klein <[log in to unmask]>wrote:
> Here's a snippet that will completely randomize the contents of an
> arbitrary string while replacing the general flow (vowels replaced with
> vowels, consonants replaced with consonants (with case retained in both
> instances), digits replaced with digits, and everything else is left alone.
>
> https://gist.github.com/1468557 <https://gist.github.com/1468557>
I like the way the output looks; but one problem with the random output is
that the same word might come out to different values. The distribution of
unique words would also be affected, not sure if that would
impact relevance/searching/index size. Also, I was sort of hoping to be
able to have some sort of browsing, so I'm looking for something that is
like a pronounceable hash one way hash. Maybe if I take the md5 of the
word; and then use that as the seed for random, and then run
your algorithm then NASA would always "hash" to the same thing?
Potential contributors of specimens would have to be okay with the fact
that a determined person could recreate their original records. The goal
is that an end user who might stumble across a random XTF tutorial
installation would not mistake what they are seeing for a real collection
description.
Hopefully nothing transforms to a swear word, I guess that is a problem
with pig latin as well...
Thanks for the feedback and the suggestion. I'll play with this some
tonight and see if setting the seed based on the input word works to get
the same pseudo-random result, seems like it should.
|