Thanks, Erik, some useful tools and advice.
I've solved the problem:
Using the emacs hexl-find-file, I could see that the wget file was OK:
000021b0: 2d33 3638 320a 3232 3109 4163 7461 204f -3682.221.Acta O
000021c0: 7274 6f70 c3a9 6469 6361 2042 7261 7369 rtop..dica Brasi
000021d0: 6c65 6972 6109 6874 7470 3a2f 2f77 7777 leira.http://www
But not from the saved from Firefox:
000021b0: 2d33 3638 320a 3232 3109 4163 7461 204f -3682.221.Acta O
000021c0: 7274 6f70 c383 c2a9 6469 6361 2042 7261 rtop....dica Bra
000021d0: 7369 6c65 6972 6109 6874 7470 3a2f 2f77 sileira.http://w
I checked my default character encoding in Firefox
[3.0.4: Edit-->Preferences; Content.Default Font.Advanced; Character
encoding.Default Character Encoding] and it turned-out it was
'Western ISO-Latin 8859-1' (!). I changed it to 'UTF-8' and all the
diacritic problems went away.
So it was a client software configuration problem, not the tictocs
site.
I'll send tictocs an update email.
But I don't understand why Firefox was ignoring the
"Content-Type: text/plain; charset=utf-8"
It should not be using the default charset (ISO-Latin 8859-1) for
this content, as it has been told the text encoding is UTF-8...
--
Thanks to all who helped (on- and off-list),
Glen
------------------------------
From: Erik Hetzner <[log in to unmask]>
Sender: Code for Libraries <[log in to unmask]>
To: [log in to unmask]
Subject: Re: [CODE4LIB] Character problems with tictoc
Date: Mon, 21 Dec 2009 11:24:49 -0800
Message-ID: <[log in to unmask]>
At Mon, 21 Dec 2009 14:09:28 -0500,
Glen Newton wrote:
>
> It seems that different people are seeing different things in their
> respective viewers (i.e some are OK and others are like what I am
> seeing).
>
> When I use wget and view the local file in Firefox (3.0.4, Linux Suse
> 11.0) I see:
> http://cuvier.cisti.nrc.ca/~gnewton/tictoc1.gif
> [gif used as it is not lossy]
>
> The text is clearly not correct.
>
> The file I got with wget is:
> http://cuvier.cisti.nrc.ca/~gnewton/tictoc.txt
>
> Is this just a question of different client software (and/or OSes)
> viewing or mangling the content?
When dealing with character set issues (especially the dreaded
double-encoding!) I find it best to use hex editors or dumpers. If in
emacs, try M-x hexl-find-file. On a Unix command line, the od or hd
commands are useful.
For the record:
00000000 48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d |HTTP/1.1 200 OK.|
00000010 0a 44 61 74 65 3a 20 4d 6f 6e 2c 20 32 31 20 44 |.Date: Mon, 21 D|
00000020 65 63 20 32 30 30 39 20 31 39 3a 32 32 3a 33 38 |ec 2009 19:22:38|
00000030 20 47 4d 54 0d 0a 53 65 72 76 65 72 3a 20 41 70 | GMT..Server: Ap|
00000040 61 63 68 65 2f 32 2e 32 2e 31 33 20 28 55 6e 69 |ache/2.2.13 (Uni|
00000050 78 29 20 6d 6f 64 5f 73 73 6c 2f 32 2e 32 2e 31 |x) mod_ssl/2.2.1|
00000060 33 20 4f 70 65 6e 53 53 4c 2f 30 2e 39 2e 38 6b |3 OpenSSL/0.9.8k|
00000070 20 50 48 50 2f 35 2e 33 2e 30 20 44 41 56 2f 32 | PHP/5.3.0 DAV/2|
00000080 0d 0a 58 2d 50 6f 77 65 72 65 64 2d 42 79 3a 20 |..X-Powered-By: |
00000090 50 48 50 2f 35 2e 33 2e 30 0d 0a 43 6f 6e 74 65 |PHP/5.3.0..Conte|
000000a0 6e 74 2d 54 79 70 65 3a 20 74 65 78 74 2f 70 6c |nt-Type: text/pl|
000000b0 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 75 74 66 |ain; charset=utf|
000000c0 2d 38 0d 0a 54 72 61 6e 73 66 65 72 2d 45 6e 63 |-8..Transfer-Enc|
000000d0 6f 64 69 6e 67 3a 20 63 68 75 6e 6b 65 64 0d 0a |oding: chunked..|
...
00002230 4f 72 74 68 6f 70 61 65 64 69 63 61 09 68 74 74 |Orthopaedica.htt|
00002240 70 3a 2f 2f 69 6e 66 6f 72 6d 61 68 65 61 6c 74 |p://informahealt|
00002250 68 63 61 72 65 2e 63 6f 6d 2f 61 63 74 69 6f 6e |hcare.com/action|
00002260 2f 73 68 6f 77 46 65 65 64 3f 6a 63 3d 6f 72 74 |/showFeed?jc=ort|
00002270 26 74 79 70 65 3d 65 74 6f 63 26 66 65 65 64 3d |&type=etoc&feed=|
00002280 72 73 73 09 31 37 34 35 2d 33 36 37 34 09 31 37 |rss.1745-3674.17|
00002290 34 35 2d 33 36 38 32 0a 32 32 31 09 41 63 74 61 |45-3682.221.Acta|
000022a0 20 4f 72 74 6f 70 c3 a9 64 69 63 61 20 42 72 61 | Ortop..dica Bra|
000022b0 73 69 6c 65 69 72 61 09 68 74 74 70 3a 2f 2f 77 |sileira.http://w|
...
best,
Erik Hetzner
----------------------------------------------------------------------
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3
[GNUPG:] ERRSIG 081801FF01DB07E3 17 2 01 1261423489 9
[GNUPG:] NO_PUBKEY 081801FF01DB07E3
|