LISTSERV 16.5 - CODE4LIB Archives

If you're looking for a book-length treatment, 'Unicode Explained' is 
fairly readable, and the first three chapters are about character 
encodings in general:

http://books.google.com/books?id=PcWU2yxc8WkC&printsec=frontcover

On 12/16/2009 12:02 PM, Ken Irwin wrote:
> Hi all,
>
> I'm looking for a good source to help me understand character sets and how to use them. I pretty much know nothing about this - the whole world of Unicode, ASCII, octal, UTF-8, etc. is baffling to me.
>
> My immediate issue is that I think I need to integrate data from a variety of character sets into one MySQL table - I expect I need some way to convert from one to another, but I don't really even know how to tell which data are in which format.
>
> Our homegrown journal list (akin to SerialsSolutions) includes data ingested from publishers, vendors, the library catalog (III), etc. When I look at the data in emacs, some of it renders like this:
>   Revista de Oncolog\303\255a                  [slashes-and-digits instead of diacritics]
> And other data looks more like:
>   Revista de MÃºsica Latinoamericana    [weird characters instead of diacritics]
>
> My MySQL table is currently set up with the collation set to: utf8-bin , and the titles from the second category (weird characters display in emacs) render properly when the database data is output to the a web browser. The data from the former example (\###) renders as an "I don't know what character this is" placeholder in Firefox and IE.
>
> So, can someone please point me toward any or all of the following?
>
> ·         A good primer for understanding all of this stuff
>
> ·         A method for converting all of my data to the same character set so it plays nicely in the database
>
> ·         The names of which character-sets I might be working with here
>
> Many thanks!
>
> Ken
> ---
> [This E-mail scanned for viruses by Declude Virus]
>
>
>
>