Print

Print


Thanks, Erik, some useful tools and advice.

I've solved the problem:

Using the emacs hexl-find-file, I could see that the wget file was OK:   
 
000021b0: 2d33 3638 320a 3232 3109 4163 7461 204f  -3682.221.Acta O
000021c0: 7274 6f70 c3a9 6469 6361 2042 7261 7369  rtop..dica Brasi
000021d0: 6c65 6972 6109 6874 7470 3a2f 2f77 7777  leira.http://www

But not from the saved from Firefox:

000021b0: 2d33 3638 320a 3232 3109 4163 7461 204f  -3682.221.Acta O
000021c0: 7274 6f70 c383 c2a9 6469 6361 2042 7261  rtop....dica Bra
000021d0: 7369 6c65 6972 6109 6874 7470 3a2f 2f77  sileira.http://w

I checked my default character encoding in Firefox
[3.0.4: Edit-->Preferences; Content.Default Font.Advanced; Character
encoding.Default Character Encoding] and it turned-out it was
'Western ISO-Latin 8859-1' (!). I changed it to 'UTF-8' and all the 
diacritic problems went away.

So it was a client software configuration problem, not the tictocs
site. 

I'll send tictocs an update email.

But I don't understand why Firefox was ignoring the
 "Content-Type: text/plain; charset=utf-8"
It should not be using the default charset (ISO-Latin 8859-1) for 
this content, as it has been told the text encoding is UTF-8...

--

Thanks to all who helped (on- and off-list),

Glen

------------------------------
From:         Erik Hetzner <[log in to unmask]>
Sender:       Code for Libraries <[log in to unmask]>
To:           [log in to unmask]
Subject: Re: [CODE4LIB] Character problems with tictoc
Date:         Mon, 21 Dec 2009 11:24:49 -0800
Message-ID:  <[log in to unmask]>

At Mon, 21 Dec 2009 14:09:28 -0500,
Glen Newton wrote:
>
> It seems that different people are seeing different things in their
> respective viewers (i.e some are OK and others are like what I am
> seeing).
>
> When I use wget and view the local file in Firefox (3.0.4, Linux Suse
> 11.0) I see:
>  http://cuvier.cisti.nrc.ca/~gnewton/tictoc1.gif
> [gif used as it is not lossy]
>
> The text is clearly not correct.
>
> The file I got with wget is:
>   http://cuvier.cisti.nrc.ca/~gnewton/tictoc.txt
>
> Is this just a question of different client software (and/or OSes)
> viewing or mangling the content?

When dealing with character set issues (especially the dreaded
double-encoding!) I find it best to use hex editors or dumpers. If in
emacs, try M-x hexl-find-file. On a Unix command line, the od or hd
commands are useful.

For the record:

00000000  48 54 54 50 2f 31 2e 31  20 32 30 30 20 4f 4b 0d  |HTTP/1.1 200 OK.|
00000010  0a 44 61 74 65 3a 20 4d  6f 6e 2c 20 32 31 20 44  |.Date: Mon, 21 D|
00000020  65 63 20 32 30 30 39 20  31 39 3a 32 32 3a 33 38  |ec 2009 19:22:38|
00000030  20 47 4d 54 0d 0a 53 65  72 76 65 72 3a 20 41 70  | GMT..Server: Ap|
00000040  61 63 68 65 2f 32 2e 32  2e 31 33 20 28 55 6e 69  |ache/2.2.13 (Uni|
00000050  78 29 20 6d 6f 64 5f 73  73 6c 2f 32 2e 32 2e 31  |x) mod_ssl/2.2.1|
00000060  33 20 4f 70 65 6e 53 53  4c 2f 30 2e 39 2e 38 6b  |3 OpenSSL/0.9.8k|
00000070  20 50 48 50 2f 35 2e 33  2e 30 20 44 41 56 2f 32  | PHP/5.3.0 DAV/2|
00000080  0d 0a 58 2d 50 6f 77 65  72 65 64 2d 42 79 3a 20  |..X-Powered-By: |
00000090  50 48 50 2f 35 2e 33 2e  30 0d 0a 43 6f 6e 74 65  |PHP/5.3.0..Conte|
000000a0  6e 74 2d 54 79 70 65 3a  20 74 65 78 74 2f 70 6c  |nt-Type: text/pl|
000000b0  61 69 6e 3b 20 63 68 61  72 73 65 74 3d 75 74 66  |ain; charset=utf|
000000c0  2d 38 0d 0a 54 72 61 6e  73 66 65 72 2d 45 6e 63  |-8..Transfer-Enc|
000000d0  6f 64 69 6e 67 3a 20 63  68 75 6e 6b 65 64 0d 0a  |oding: chunked..|
...
00002230  4f 72 74 68 6f 70 61 65  64 69 63 61 09 68 74 74  |Orthopaedica.htt|
00002240  70 3a 2f 2f 69 6e 66 6f  72 6d 61 68 65 61 6c 74  |p://informahealt|
00002250  68 63 61 72 65 2e 63 6f  6d 2f 61 63 74 69 6f 6e  |hcare.com/action|
00002260  2f 73 68 6f 77 46 65 65  64 3f 6a 63 3d 6f 72 74  |/showFeed?jc=ort|
00002270  26 74 79 70 65 3d 65 74  6f 63 26 66 65 65 64 3d  |&type=etoc&feed=|
00002280  72 73 73 09 31 37 34 35  2d 33 36 37 34 09 31 37  |rss.1745-3674.17|
00002290  34 35 2d 33 36 38 32 0a  32 32 31 09 41 63 74 61  |45-3682.221.Acta|
000022a0  20 4f 72 74 6f 70 c3 a9  64 69 63 61 20 42 72 61  | Ortop..dica Bra|
000022b0  73 69 6c 65 69 72 61 09  68 74 74 70 3a 2f 2f 77  |sileira.http://w|
...

best,
Erik Hetzner

----------------------------------------------------------------------
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3

[GNUPG:] ERRSIG 081801FF01DB07E3 17 2 01 1261423489 9
[GNUPG:] NO_PUBKEY 081801FF01DB07E3