Print

Print


Hi Ken,

In an effort to better understand character sets myself, I have brought together some information on my website, with an emphasis on library automation and the internet environment:
  
  Coded Character Sets > A Technical Primer for Librarians
  http://rocky.uta.edu/doran/charsets/

Make sure you look at the "Resources on the Web" page, too (http://rocky.uta.edu/doran/charsets/resources.html).

The quote about character sets that most resonated with me was "An apparently simple subject which turns out to be brutally complicated."  They are definitely worth learning about, though!  Have fun.

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# [log in to unmask]
# http://rocky.uta.edu/doran/
 

> -----Original Message-----
> From: Code for Libraries [mailto:[log in to unmask]] On Behalf Of
> Ken Irwin
> Sent: Wednesday, December 16, 2009 11:02 AM
> To: [log in to unmask]
> Subject: [CODE4LIB] character-sets for dummies?
> 
> Hi all,
> 
> I'm looking for a good source to help me understand character sets and
> how to use them. I pretty much know nothing about this - the whole
> world of Unicode, ASCII, octal, UTF-8, etc. is baffling to me.
> 
> My immediate issue is that I think I need to integrate data from a
> variety of character sets into one MySQL table - I expect I need some
> way to convert from one to another, but I don't really even know how to
> tell which data are in which format.
> 
> Our homegrown journal list (akin to SerialsSolutions) includes data
> ingested from publishers, vendors, the library catalog (III), etc. When
> I look at the data in emacs, some of it renders like this:
>  Revista de Oncolog\303\255a                  [slashes-and-digits
> instead of diacritics]
> And other data looks more like:
>  Revista de Música Latinoamericana    [weird characters instead of
> diacritics]
> 
> My MySQL table is currently set up with the collation set to: utf8-bin
> , and the titles from the second category (weird characters display in
> emacs) render properly when the database data is output to the a web
> browser. The data from the former example (\###) renders as an "I don't
> know what character this is" placeholder in Firefox and IE.
> 
> So, can someone please point me toward any or all of the following?
> 
> ·         A good primer for understanding all of this stuff
> 
> ·         A method for converting all of my data to the same character
> set so it plays nicely in the database
> 
> ·         The names of which character-sets I might be working with
> here
> 
> Many thanks!
> 
> Ken