I've altered my previous function (https://gist.github.com/1468557) into something that's pretty much a straight letter-substitution cipher. It could be turned back into plaintext pretty easily by someone who really wanted to (by using frequency analysis and other hints like single-letter words), but I can't imagine anyone going to the trouble over finding aids. :) This keeps words (and therefore word frequency/distribution) consistent, even across changes in case. But if you really want it to index realistically, it would need to be altered to leave common stems (-s, -ies, -ed, -ing, etc.) alone (assuming the indexer uses some sort of stemming algorithm). On Mon, Dec 12, 2011 at 12:06 PM, Brian Tingle < [log in to unmask]> wrote: > On Mon, Dec 12, 2011 at 10:56 AM, Michael B. Klein <[log in to unmask] > >wrote: > > > Here's a snippet that will completely randomize the contents of an > > arbitrary string while replacing the general flow (vowels replaced with > > vowels, consonants replaced with consonants (with case retained in both > > instances), digits replaced with digits, and everything else is left > alone. > > > > https://gist.github.com/1468557 <https://gist.github.com/1468557> > > > I like the way the output looks; but one problem with the random output is > that the same word might come out to different values. The distribution of > unique words would also be affected, not sure if that would > impact relevance/searching/index size. Also, I was sort of hoping to be > able to have some sort of browsing, so I'm looking for something that is > like a pronounceable hash one way hash. Maybe if I take the md5 of the > word; and then use that as the seed for random, and then run > your algorithm then NASA would always "hash" to the same thing? > > Potential contributors of specimens would have to be okay with the fact > that a determined person could recreate their original records. The goal > is that an end user who might stumble across a random XTF tutorial > installation would not mistake what they are seeing for a real collection > description. > > Hopefully nothing transforms to a swear word, I guess that is a problem > with pig latin as well... > > Thanks for the feedback and the suggestion. I'll play with this some > tonight and see if setting the seed based on the input word works to get > the same pseudo-random result, seems like it should. >