Print

Print


Hi Stuart,

> These have been included because they are in widespread use in a current 
> written culture. The problems I personally have are down to characters 
> used by a single publisher in a handful of books more than a hundred 
> years ago. Such characters are explicitly excluded from Unicode.
> 
> In the early period of the standardisation of the Māori language there
> were several competing ideas of what to use as a character set. One of
> those included a 'wh' ligature as a character. Several works were
> printed using this ligature. This ligature does not qualify for
> inclusion in Unicode.

That is a matter of discussion. If you do not call it 'ligature' chances 
are higher to get it included.

> To see how we handle the text, see:
> 
> http://www.nzetc.org/tm/scholarly/tei-Auc1911NgaM-t1-body-d4.html
> 
> The underlying representation is TEI/XML, which has a mechanism to
> handle such glyphs. The things I'm still unhappy with are:
> 
> * getting reasonable results when users cut-n-paste the text/image HTML
> combination to some other application
> * some browsers still like line-breaking on images in the middle of words

That's interesting and reminds me on the treatment of mathematical 
formula in journal titels which mostly end up as ugly images.

In Unicode you are allowed to assign private characters

http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Private_use_characters

The U+200D ZERO WIDTH JOINER could also be used but most browsers will 
not support it - you need a font that supports your character anyway.

http://blogs.msdn.com/michkap/archive/2006/02/15/532394.aspx

In summary: Unicode is just a subset of all characters which have been 
used for written communication and whether a character gets included 
depends not only on objective properties but on lobbying and other 
circumstances. The deeper you dig the more nasty Unicode gets - as all 
complex formats and standards.

Cheers
Jakob

P.S: Michael Kaplan's  blog also contains a funny article about emoji: 
http://blogs.msdn.com/michkap/archive/2010/04/27/10002948.aspx

-- 
Jakob Voß <[log in to unmask]>, skype: nichtich
Verbundzentrale des GBV (VZG) / Common Library Network
Platz der Goettinger Sieben 1, 37073 Göttingen, Germany
+49 (0)551 39-10242, http://www.gbv.de