On Mon, Dec 12, 2011 at 10:56 AM, Michael B. Klein <[log in to unmask]>wrote: > Here's a snippet that will completely randomize the contents of an > arbitrary string while replacing the general flow (vowels replaced with > vowels, consonants replaced with consonants (with case retained in both > instances), digits replaced with digits, and everything else is left alone. > > https://gist.github.com/1468557 <https://gist.github.com/1468557> I like the way the output looks; but one problem with the random output is that the same word might come out to different values. The distribution of unique words would also be affected, not sure if that would impact relevance/searching/index size. Also, I was sort of hoping to be able to have some sort of browsing, so I'm looking for something that is like a pronounceable hash one way hash. Maybe if I take the md5 of the word; and then use that as the seed for random, and then run your algorithm then NASA would always "hash" to the same thing? Potential contributors of specimens would have to be okay with the fact that a determined person could recreate their original records. The goal is that an end user who might stumble across a random XTF tutorial installation would not mistake what they are seeing for a real collection description. Hopefully nothing transforms to a swear word, I guess that is a problem with pig latin as well... Thanks for the feedback and the suggestion. I'll play with this some tonight and see if setting the seed based on the input word works to get the same pseudo-random result, seems like it should.