Print

Print


A classic general overview (on the topic of "what the heck ARE
character sets???"):

http://www.joelonsoftware.com/articles/Unicode.html



On Wed, Dec 16, 2009 at 11:02 AM, Ken Irwin <[log in to unmask]> wrote:
> Hi all,
>
> I'm looking for a good source to help me understand character sets and how to use them. I pretty much know nothing about this - the whole world of Unicode, ASCII, octal, UTF-8, etc. is baffling to me.
>
> My immediate issue is that I think I need to integrate data from a variety of character sets into one MySQL table - I expect I need some way to convert from one to another, but I don't really even know how to tell which data are in which format.
>
> Our homegrown journal list (akin to SerialsSolutions) includes data ingested from publishers, vendors, the library catalog (III), etc. When I look at the data in emacs, some of it renders like this:
>  Revista de Oncolog\303\255a                  [slashes-and-digits instead of diacritics]
> And other data looks more like:
>  Revista de Música Latinoamericana    [weird characters instead of diacritics]
>
> My MySQL table is currently set up with the collation set to: utf8-bin , and the titles from the second category (weird characters display in emacs) render properly when the database data is output to the a web browser. The data from the former example (\###) renders as an "I don't know what character this is" placeholder in Firefox and IE.
>
> So, can someone please point me toward any or all of the following?
>
> ·         A good primer for understanding all of this stuff
>
> ·         A method for converting all of my data to the same character set so it plays nicely in the database
>
> ·         The names of which character-sets I might be working with here
>
> Many thanks!
>
> Ken
>