On Dec 12, 2011, at 6:35 PM, Michael B. Klein wrote:
> I've altered my previous function (https://gist.github.com/1468557) into
> something that's pretty much a straight letter-substitution cipher.
This is what I ended up using
https://github.com/tingletech/greeker.py/blob/3ba1e84bc1ea51fa501c1a479f8758593bac5ffd/greeker.py#L131-150
it uses a different straight letter-substitutiuon for every unique word, using the input as the random's seed.
It does not look as pretty as your code
> But if you really want it to index
> realistically, it would need to be altered to leave common stems (-s, -ies,
> -ed, -ing, etc.) alone (assuming the indexer uses some sort of stemming
> algorithm).
I'm only doing nouns, and I'm matching inflection. I guess I could investigate stemming as well.
I'd still like to play with substituting nouns using a dictionary of nouns of the same length; but I have not found a dictionary of nouns to use, I thought I would find one in nltk somewhere, but I did not figure out how to use wordnet when I looked at it.
|