[I realise there was a recent related 'Character-sets for dummies'[1]
discussion recently]
I am using tictocs[2] list of journal RSS feeds, and I am getting
gibberish in places for diacritics. Below is an example:
in emacs:
221 Acta Ortop dica Brasileira http://www.scielo.br/rss.php?pid=1413-7852&lang=en 1413-7852
in Firefox:
221 Acta Ortop dica Brasileira http://www.scielo.br/rss.php?pid=1413-7852&lang=en 1413-7852
Note that the emacs view is both of a save of the Firefox, and from a
direct download using 'wget'.
Is this something on my end, or are the tictocs people not serving
proper UTF-8?
The HTTP header from wget claims UTF-8:
> wget -S http://www.tictocs.ac.uk/text.php
> --2009-12-21 12:47:59-- http://www.tictocs.ac.uk/text.php
> Resolving www.tictocs.ac.uk... 130.88.101.131
> Connecting to www.tictocs.ac.uk|130.88.101.131|:80... connected.
> HTTP request sent, awaiting response...
> HTTP/1.1 200 OK
> Date: Mon, 21 Dec 2009 17:42:05 GMT
> Server: Apache/2.2.13 (Unix) mod_ssl/2.2.13 OpenSSL/0.9.8k PHP/5.3.0 DAV/2
> X-Powered-By: PHP/5.3.0
> Content-Type: text/plain; charset=utf-8
> Connection: close
> Length: unspecified [text/plain]
><....stuff removed>
Can someone validate if they are also experiencing this issue?
Thanks,
Glen
[1]https://listserv.nd.edu/cgi-bin/wa?S2=CODE4LIB&q=&s=character-sets+for+dummies&f=&a=&b=
[2]http://www.tictocs.ac.uk/text.php
--
Glen Newton | [log in to unmask]
Researcher, Information Science, CISTI Research
& NRC W3C Advisory Committee Representative
http://tinyurl.com/yvchmu
tel/t l: 613-990-9163 | facsimile/t l copieur 613-952-8246
Canada Institute for Scientific and Technical Information (CISTI)
National Research Council Canada (NRC)| M-55, 1200 Montreal Road
http://www.nrc-cnrc.gc.ca/
Institut canadien de l'information scientifique et technique (ICIST)
Conseil national de recherches Canada | M-55, 1200 chemin Montr al
Ottawa, Ontario K1A 0R6
Government of Canada | Gouvernement du Canada
--
|