Print

Print


[I realise there was a recent related 'Character-sets for dummies'[1]
discussion recently] 

I am using tictocs[2] list of journal RSS feeds, and I am getting
gibberish in places for diacritics. Below is an example:

in emacs:
 221	Acta Ortop  dica Brasileira	http://www.scielo.br/rss.php?pid=1413-7852&lang=en	1413-7852	
in Firefox:
 221	Acta Ortop  dica Brasileira	http://www.scielo.br/rss.php?pid=1413-7852&lang=en	1413-7852

Note that the emacs view is both of a save of the Firefox, and from a
direct download using 'wget'.

Is this something on my end, or are the tictocs people not serving
proper UTF-8? 

The HTTP header from wget claims UTF-8:
> wget -S http://www.tictocs.ac.uk/text.php
> --2009-12-21 12:47:59--  http://www.tictocs.ac.uk/text.php
> Resolving www.tictocs.ac.uk... 130.88.101.131
> Connecting to www.tictocs.ac.uk|130.88.101.131|:80... connected.
> HTTP request sent, awaiting response... 
>   HTTP/1.1 200 OK
>   Date: Mon, 21 Dec 2009 17:42:05 GMT
>   Server: Apache/2.2.13 (Unix) mod_ssl/2.2.13 OpenSSL/0.9.8k PHP/5.3.0 DAV/2
>   X-Powered-By: PHP/5.3.0
>   Content-Type: text/plain; charset=utf-8
>   Connection: close
> Length: unspecified [text/plain]
><....stuff removed>

Can someone validate if they are also experiencing this issue?

Thanks,
Glen

[1]https://listserv.nd.edu/cgi-bin/wa?S2=CODE4LIB&q=&s=character-sets+for+dummies&f=&a=&b=
[2]http://www.tictocs.ac.uk/text.php

-- 
Glen Newton | [log in to unmask]
Researcher, Information Science, CISTI Research
& NRC W3C Advisory Committee Representative
http://tinyurl.com/yvchmu
tel/t l: 613-990-9163 | facsimile/t l copieur 613-952-8246
Canada Institute for Scientific and Technical Information (CISTI)
National Research Council Canada (NRC)| M-55, 1200 Montreal Road
http://www.nrc-cnrc.gc.ca/
Institut canadien de l'information scientifique et technique (ICIST) 
Conseil national de recherches Canada | M-55, 1200 chemin Montr al
Ottawa, Ontario K1A 0R6  
Government of Canada | Gouvernement du Canada   
--