LISTSERV 16.5 - CODE4LIB Archives

Jakob Voss wrote:
> Eric Hellman wrote:
> 
>> May I just add here that of all the things we've talked about in
>> these threads, perhaps the only thing that will still be in use a
>> hundred years from now will be Unicode. إن شاء الله
> 
> Stuart Yeates wrote:
> 
>  > Sadly, yes, I agree with you on this.
>  >
>  > Do you have any idea how demotivating that is for those of us
>  > maintaining collections with works containing characters that don't
>  > qualify for inclusion?
> 
> May I just add there that Unicode is evolving too and you can help to 
> get missing characters included. One of the next updates will even 
> include hundreds of icons such as a slice of pizza, a kissing couple, 
> and the mount Fuji (See this zipped PDF: http://is.gd/bABl9 and 
> http://en.wikipedia.org/wiki/Emoji).

Indeed.

These have been included because they are in widespread use in a current 
written culture. The problems I personally have are down to characters 
used by a single publisher in a handful of books more than a hundred 
years ago. Such characters are explicitly excluded from Unicode.

In the early period of the standardisation of the Māori language there
were several competing ideas of what to use as a character set. One of
those included a 'wh' ligature as a character. Several works were
printed using this ligature. This ligature does not qualify for
inclusion in Unicode.

To see how we handle the text, see:

http://www.nzetc.org/tm/scholarly/tei-Auc1911NgaM-t1-body-d4.html

The underlying representation is TEI/XML, which has a mechanism to
handle such glyphs. The things I'm still unhappy with are:

* getting reasonable results when users cut-n-paste the text/image HTML
combination to some other application
* some browsers still like line-breaking on images in the middle of words

cheers
stuart
-- 
Stuart Yeates
http://www.nzetc.org/       New Zealand Electronic Text Centre
http://researcharchive.vuw.ac.nz/     Institutional Repository