On Dec 12, 2011, at 3:06 PM, Brian Tingle wrote: > On Mon, Dec 12, 2011 at 10:56 AM, Michael B. Klein <[log in to unmask]>wrote: > >> Here's a snippet that will completely randomize the contents of an >> arbitrary string while replacing the general flow (vowels replaced with >> vowels, consonants replaced with consonants (with case retained in both >> instances), digits replaced with digits, and everything else is left alone. >> >> https://gist.github.com/1468557 <https://gist.github.com/1468557> > > > I like the way the output looks; but one problem with the random output is > that the same word might come out to different values. The distribution of > unique words would also be affected, not sure if that would > impact relevance/searching/index size. Also, I was sort of hoping to be > able to have some sort of browsing, so I'm looking for something that is > like a pronounceable hash one way hash. Maybe if I take the md5 of the > word; and then use that as the seed for random, and then run > your algorithm then NASA would always "hash" to the same thing? If the list of missions / agencies / etc is rather small, it'd be possible to just come up with a random list of nouns, and make a sort of secret decoder ring, assigning each mission name that needs to be replaced with a random (but consistent) word. I just tend to replace all of my mission / spacecraft / instrument acronyms with 'BOGUS' when I have to do similar stuff to generate records when we're testing data systems, but I tend to just have the acronyms, not the full spelled out names (which are looked up from the acronyms), and I don't have large amounts of free text to worry about. -Joe