stuart yeates writes
> Thomas Krichel wrote:
...
> > It will try to guess between UTF-8 and ISO-8859-1. This can be done
> > because UTF-8 has many invalid byte sequences. But say if you
> > wanted to guess between ISO-8859-1 and ISO-8859-2, you'd be out of
> > luck.
>
> Not necessarily.
I meant you would be out of luck with the tool I proposed.
> There are tools such as http://www.let.rug.nl/~vannoord/TextCat/
> which provide very reliable guessing of languages.
I am happy to read this, I had requirements for language
detection several times already.
But the detection of languages is a bit of a different
problem than the detection of character codes.
Cheers,
Thomas Krichel http://openlib.org/home/krichel
http://authorclaim.org/profile/pkr1
skype: thomaskrichel
|