Print

Print


On Dec 12, 2011, at 3:06 PM, Brian Tingle wrote:

> On Mon, Dec 12, 2011 at 10:56 AM, Michael B. Klein <[log in to unmask]>wrote:
> 
>> Here's a snippet that will completely randomize the contents of an
>> arbitrary string while replacing the general flow (vowels replaced with
>> vowels, consonants replaced with consonants (with case retained in both
>> instances), digits replaced with digits, and everything else is left alone.
>> 
>> https://gist.github.com/1468557  <https://gist.github.com/1468557>
> 
> 
> I like the way the output looks; but one problem with the random output is
> that the same word might come out to different values.  The distribution of
> unique words would also be affected, not sure if that would
> impact relevance/searching/index size.  Also, I was sort of hoping to be
> able to have some sort of browsing, so I'm looking for something that is
> like a pronounceable hash one way hash.  Maybe if I take the md5 of the
> word; and then use that as the seed for random, and then run
> your algorithm then NASA would always "hash" to the same thing?

If the list of missions / agencies / etc is rather small, it'd be possible to
just come up with a random list of nouns, and make a sort of secret
decoder ring, assigning each mission name that needs to be replaced
with a random (but consistent) word.

I just tend to replace all of my mission / spacecraft / instrument acronyms
with 'BOGUS' when I have to do similar stuff to generate records when
we're testing data systems, but I tend to just have the acronyms, not
the full spelled out names (which are looked up from the acronyms),
and I don't have large amounts of free text to worry about.

-Joe