Charset problem again - Mozilla

This is a discussion on Charset problem again - Mozilla ; Hi, I thought my problem with utf-8 warning popups on outgoing mail had been solved by the addition of a charset fallback rule, but suddenly I am getting them again. To review, I am set to send in the ISO-8859-15 ...

+ Reply to Thread
Results 1 to 14 of 14

Thread: Charset problem again

  1. Charset problem again

    Hi,

    I thought my problem with utf-8 warning popups on outgoing mail
    had been solved by the addition of a charset fallback rule, but
    suddenly I am getting them again.

    To review, I am set to send in the ISO-8859-15 charset and the
    Verdana font. Thanks to help received here, I added to my
    about:config the following two rules:


    intl.fallbackCharsetList.ISO-8859-1 / user set / string / utf-8

    intl.fallbackCharsetList.ISO-8859-15 / user set / string / utf-8


    Now, as an example, I just tried to send the following text in an
    html mail:

    �democracy�

    That's copied from a Web page which FF is set to read in utf-8,
    and apparently the page is written in "font-family: Arial,
    Helvetica, sans-serif". I don't think the problematic characters
    print accurately here, but it looks like a black diamond with a
    white question mark in it. I've been getting *a lot* of that
    symbol lately in FF2, using utf-8 as my default charset, a
    charset which is supposed to read anything.

    In FF, when I switch to good old Windows-1252, the text in
    question simply reads

    “democracy”

    which seems to be printing here clearly.

    So I have two questions. First, generally, why won't utf-8 read a
    very common character? Even when I switch (on the fly) my
    outgoing mail to utf-8 the character is not read correctly.
    Second, if I have utf-8 set as a fallback encoding, why is TB
    showing me the encoding warning on outgoing mail? I thought I had
    that problem solved.

    Thanks,
    p.
    --
    Mail: Thunderbird 1.5.0.9
    News: Dialog 2.0.15.1 (beta 38)
    OS: Win XP-H sp2

  2. Re: Charset problem again

    Tbird Leader Paul_B radioed the tower, On 1/23/2007 7:14 PM:
    > Hi,
    >
    > I thought my problem with utf-8 warning popups on outgoing mail
    > had been solved by the addition of a charset fallback rule, but
    > suddenly I am getting them again.
    >
    > To review, I am set to send in the ISO-8859-15 charset and the
    > Verdana font. Thanks to help received here, I added to my
    > about:config the following two rules:
    >
    >
    > intl.fallbackCharsetList.ISO-8859-1 / user set / string / utf-8
    >
    > intl.fallbackCharsetList.ISO-8859-15 / user set / string / utf-8
    >
    >
    > Now, as an example, I just tried to send the following text in an
    > html mail:
    >
    > �democracy�
    >
    > That's copied from a Web page which FF is set to read in utf-8,
    > and apparently the page is written in "font-family: Arial,
    > Helvetica, sans-serif". I don't think the problematic characters
    > print accurately here, but it looks like a black diamond with a
    > white question mark in it. I've been getting *a lot* of that
    > symbol lately in FF2, using utf-8 as my default charset, a
    > charset which is supposed to read anything.
    >
    > In FF, when I switch to good old Windows-1252, the text in
    > question simply reads
    >
    > “democracy”
    >
    > which seems to be printing here clearly.
    >
    > So I have two questions. First, generally, why won't utf-8 read a
    > very common character? Even when I switch (on the fly) my
    > outgoing mail to utf-8 the character is not read correctly.
    > Second, if I have utf-8 set as a fallback encoding, why is TB
    > showing me the encoding warning on outgoing mail? I thought I had
    > that problem solved.
    >
    > Thanks,
    > p.
    >


    Bare in mind the font used for display must be a UNICODE supporting font
    and the OS must also be UNICODE complaint. Windows 98SE and ME are only
    partially supportive of UNICODE. Arial font file needs to indicate a
    font version of 2.7x for support of more than Windows-1252 character set.
    --
    Ron K.
    Don't be a fonted, it's just type casting

  3. Re: Charset problem again

    On 2007-01-23 17:14 (-0700 UTC), Paul_B wrote:

    > Hi,
    >
    > I thought my problem with utf-8 warning popups on outgoing mail
    > had been solved by the addition of a charset fallback rule, but
    > suddenly I am getting them again.
    >
    > To review, I am set to send in the ISO-8859-15 charset and the
    > Verdana font. Thanks to help received here, I added to my
    > about:config the following two rules:
    >
    >
    > intl.fallbackCharsetList.ISO-8859-1 / user set / string / utf-8
    >
    > intl.fallbackCharsetList.ISO-8859-15 / user set / string / utf-8
    >
    >
    > Now, as an example, I just tried to send the following text in an
    > html mail:
    >
    > �democracy�
    >
    > That's copied from a Web page which FF is set to read in utf-8,
    > and apparently the page is written in "font-family: Arial,
    > Helvetica, sans-serif". I don't think the problematic characters
    > print accurately here, but it looks like a black diamond with a
    > white question mark in it. I've been getting *a lot* of that
    > symbol lately in FF2, using utf-8 as my default charset, a
    > charset which is supposed to read anything.
    >
    > In FF, when I switch to good old Windows-1252, the text in
    > question simply reads
    >
    > “democracy”
    >
    > which seems to be printing here clearly.
    >
    > So I have two questions. First, generally, why won't utf-8 read a
    > very common character? Even when I switch (on the fly) my
    > outgoing mail to utf-8 the character is not read correctly.
    > Second, if I have utf-8 set as a fallback encoding, why is TB
    > showing me the encoding warning on outgoing mail? I thought I had
    > that problem solved.


    Ron K. may have hit the nail on the head: What OS are you on?

    BTW, I suspect the whole thing about Win-1252 is a bit of a red herring,
    tho' I'm not sure why you keep encountering it. The positions for pening
    and closing double quotations in CP-1252 (/e.g./, + 0147 and +
    0148) are actually control characters in ISO-8859-1 (and UTF-8).

    /b.

    --
    People are stupid. /A/ person may be smart, but /people/ are stupid.
    --Stephen M. Graham

  4. Re: Charset problem again

    On Tue, 23 Jan 2007 18:30:19 -0700, Brian Heinrich wrote:

    > On 2007-01-23 17:14 (-0700 UTC), Paul_B wrote:
    >
    >> Hi,
    >>
    >> I thought my problem with utf-8 warning popups on outgoing mail
    >> had been solved by the addition of a charset fallback rule, but
    >> suddenly I am getting them again.
    >>
    >> To review, I am set to send in the ISO-8859-15 charset and the
    >> Verdana font. Thanks to help received here, I added to my
    >> about:config the following two rules:
    >>
    >> intl.fallbackCharsetList.ISO-8859-1 / user set / string / utf-8
    >>
    >> intl.fallbackCharsetList.ISO-8859-15 / user set / string / utf-8
    >>
    >> Now, as an example, I just tried to send the following text in an
    >> html mail:
    >>
    >> �democracy�
    >>
    >> That's copied from a Web page which FF is set to read in utf-8,
    >> and apparently the page is written in "font-family: Arial,
    >> Helvetica, sans-serif". I don't think the problematic characters
    >> print accurately here, but it looks like a black diamond with a
    >> white question mark in it. I've been getting *a lot* of that
    >> symbol lately in FF2, using utf-8 as my default charset, a
    >> charset which is supposed to read anything.
    >>
    >> In FF, when I switch to good old Windows-1252, the text in
    >> question simply reads
    >>
    >> “democracy”
    >>
    >> which seems to be printing here clearly.
    >>
    >> So I have two questions. First, generally, why won't utf-8 read a
    >> very common character? Even when I switch (on the fly) my
    >> outgoing mail to utf-8 the character is not read correctly.
    >> Second, if I have utf-8 set as a fallback encoding, why is TB
    >> showing me the encoding warning on outgoing mail? I thought I had
    >> that problem solved.

    >
    > Ron K. may have hit the nail on the head: What OS are you on?
    >
    > BTW, I suspect the whole thing about Win-1252 is a bit of a red herring,
    > tho' I'm not sure why you keep encountering it. The positions for pening
    > and closing double quotations in CP-1252 (/e.g./, + 0147 and +
    > 0148) are actually control characters in ISO-8859-1 (and UTF-8).
    >
    > /b.



    Brian, the only reason I keep bringing up Win-1252 is that I
    don't have any of these problems when I use it.

    Thanks,
    p.
    --
    Mail: Thunderbird 1.5.0.9
    News: Dialog 2.0.15.1 (beta 38)
    OS: Win XP-H sp2

  5. Re: Charset problem again

    On Tue, 23 Jan 2007 19:27:53 -0500, Ron K. wrote:

    > Tbird Leader Paul_B radioed the tower, On 1/23/2007 7:14 PM:
    >> Hi,
    >>
    >> I thought my problem with utf-8 warning popups on outgoing mail
    >> had been solved by the addition of a charset fallback rule, but
    >> suddenly I am getting them again.
    >>
    >> To review, I am set to send in the ISO-8859-15 charset and the
    >> Verdana font. Thanks to help received here, I added to my
    >> about:config the following two rules:
    >>
    >>
    >> intl.fallbackCharsetList.ISO-8859-1 / user set / string / utf-8
    >>
    >> intl.fallbackCharsetList.ISO-8859-15 / user set / string / utf-8
    >>
    >>
    >> Now, as an example, I just tried to send the following text in an
    >> html mail:
    >>
    >> �democracy�
    >>
    >> That's copied from a Web page which FF is set to read in utf-8,
    >> and apparently the page is written in "font-family: Arial,
    >> Helvetica, sans-serif". I don't think the problematic characters
    >> print accurately here, but it looks like a black diamond with a
    >> white question mark in it. I've been getting *a lot* of that
    >> symbol lately in FF2, using utf-8 as my default charset, a
    >> charset which is supposed to read anything.
    >>
    >> In FF, when I switch to good old Windows-1252, the text in
    >> question simply reads
    >>
    >> “democracy”
    >>
    >> which seems to be printing here clearly.
    >>
    >> So I have two questions. First, generally, why won't utf-8 read a
    >> very common character? Even when I switch (on the fly) my
    >> outgoing mail to utf-8 the character is not read correctly.
    >> Second, if I have utf-8 set as a fallback encoding, why is TB
    >> showing me the encoding warning on outgoing mail? I thought I had
    >> that problem solved.
    >>
    >> Thanks,
    >> p.
    >>

    >
    > Bare in mind the font used for display must be a UNICODE supporting font
    > and the OS must also be UNICODE complaint. Windows 98SE and ME are only
    > partially supportive of UNICODE. Arial font file needs to indicate a
    > font version of 2.7x for support of more than Windows-1252 character set.


    Ok. I'm on Win XP. The Arial is version 3.0. In Word I typed
    alt-0147/8, according to Brian's kind insight nearby, and it
    printed fine in both Arial and in Verdana, my default TB and FF
    font.

    Blowing it up, I see that it's simply typographer's quotes.

    So I continue to have no idea why I'm getting this error.

    Thanks,
    p.
    --
    Mail: Thunderbird 1.5.0.9
    News: Dialog 2.0.15.1 (beta 38)
    OS: Win XP-H sp2

  6. Re: Charset problem again

    On 2007-01-23 20:11 (-0700 UTC), Paul_B wrote:

    > On Tue, 23 Jan 2007 19:27:53 -0500, Ron K. wrote:
    >
    >> Tbird Leader Paul_B radioed the tower, On 1/23/2007 7:14 PM:
    >>> Hi,
    >>>
    >>> I thought my problem with utf-8 warning popups on outgoing mail
    >>> had been solved by the addition of a charset fallback rule, but
    >>> suddenly I am getting them again.
    >>>
    >>> To review, I am set to send in the ISO-8859-15 charset and the
    >>> Verdana font. Thanks to help received here, I added to my
    >>> about:config the following two rules:
    >>>
    >>>
    >>> intl.fallbackCharsetList.ISO-8859-1 / user set / string / utf-8
    >>>
    >>> intl.fallbackCharsetList.ISO-8859-15 / user set / string / utf-8
    >>>
    >>>
    >>> Now, as an example, I just tried to send the following text in an
    >>> html mail:
    >>>
    >>> �democracy�
    >>>
    >>> That's copied from a Web page which FF is set to read in utf-8,
    >>> and apparently the page is written in "font-family: Arial,
    >>> Helvetica, sans-serif". I don't think the problematic characters
    >>> print accurately here, but it looks like a black diamond with a
    >>> white question mark in it. I've been getting *a lot* of that
    >>> symbol lately in FF2, using utf-8 as my default charset, a
    >>> charset which is supposed to read anything.
    >>>
    >>> In FF, when I switch to good old Windows-1252, the text in
    >>> question simply reads
    >>>
    >>> “democracy”
    >>>
    >>> which seems to be printing here clearly.
    >>>
    >>> So I have two questions. First, generally, why won't utf-8 read a
    >>> very common character? Even when I switch (on the fly) my
    >>> outgoing mail to utf-8 the character is not read correctly.
    >>> Second, if I have utf-8 set as a fallback encoding, why is TB
    >>> showing me the encoding warning on outgoing mail? I thought I had
    >>> that problem solved.
    >>>
    >>> Thanks,
    >>> p.
    >>>

    >> Bare in mind the font used for display must be a UNICODE supporting font
    >> and the OS must also be UNICODE complaint. Windows 98SE and ME are only
    >> partially supportive of UNICODE. Arial font file needs to indicate a
    >> font version of 2.7x for support of more than Windows-1252 character set.

    >
    > Ok. I'm on Win XP. The Arial is version 3.0. In Word I typed
    > alt-0147/8, according to Brian's kind insight nearby, and it
    > printed fine in both Arial and in Verdana, my default TB and FF
    > font.
    >
    > Blowing it up, I see that it's simply typographer's quotes.
    >
    > So I continue to have no idea why I'm getting this error.


    (Decimal) 0147 and 0148 are not valid ISO-8859-1/Unicode code points . . .
    but they are valid CP-1252 code points. . . .

    I have UTF-8 set for out-going e-mail and ISO-885-1 set for in-coming mail .
    .. . but I don't have Apply the default character encoding to all incoming
    messages selected; nor do I have Use the default character encoding in
    replies selected . . . and I've not been having any problems . . . FWIW. . . .

    /b.

    --
    People are stupid. /A/ person may be smart, but /people/ are stupid.
    --Stephen M. Graham

  7. Re: Charset problem again

    On Tue, 23 Jan 2007 20:55:35 -0700, Brian Heinrich wrote:

    > On 2007-01-23 20:11 (-0700 UTC), Paul_B wrote:
    >
    >> On Tue, 23 Jan 2007 19:27:53 -0500, Ron K. wrote:
    >>
    >>> Tbird Leader Paul_B radioed the tower, On 1/23/2007 7:14 PM:
    >>>> Hi,
    >>>>
    >>>> I thought my problem with utf-8 warning popups on outgoing mail
    >>>> had been solved by the addition of a charset fallback rule, but
    >>>> suddenly I am getting them again.
    >>>>
    >>>> To review, I am set to send in the ISO-8859-15 charset and the
    >>>> Verdana font. Thanks to help received here, I added to my
    >>>> about:config the following two rules:
    >>>>
    >>>>
    >>>> intl.fallbackCharsetList.ISO-8859-1 / user set / string / utf-8
    >>>>
    >>>> intl.fallbackCharsetList.ISO-8859-15 / user set / string / utf-8
    >>>>
    >>>>
    >>>> Now, as an example, I just tried to send the following text in an
    >>>> html mail:
    >>>>
    >>>> �democracy�
    >>>>
    >>>> That's copied from a Web page which FF is set to read in utf-8,
    >>>> and apparently the page is written in "font-family: Arial,
    >>>> Helvetica, sans-serif". I don't think the problematic characters
    >>>> print accurately here, but it looks like a black diamond with a
    >>>> white question mark in it. I've been getting *a lot* of that
    >>>> symbol lately in FF2, using utf-8 as my default charset, a
    >>>> charset which is supposed to read anything.
    >>>>
    >>>> In FF, when I switch to good old Windows-1252, the text in
    >>>> question simply reads
    >>>>
    >>>> “democracy”
    >>>>
    >>>> which seems to be printing here clearly.
    >>>>
    >>>> So I have two questions. First, generally, why won't utf-8 read a
    >>>> very common character? Even when I switch (on the fly) my
    >>>> outgoing mail to utf-8 the character is not read correctly.
    >>>> Second, if I have utf-8 set as a fallback encoding, why is TB
    >>>> showing me the encoding warning on outgoing mail? I thought I had
    >>>> that problem solved.
    >>>>
    >>>> Thanks,
    >>>> p.
    >>>>
    >>> Bare in mind the font used for display must be a UNICODE supporting font
    >>> and the OS must also be UNICODE complaint. Windows 98SE and ME are only
    >>> partially supportive of UNICODE. Arial font file needs to indicate a
    >>> font version of 2.7x for support of more than Windows-1252 character set.

    >>
    >> Ok. I'm on Win XP. The Arial is version 3.0. In Word I typed
    >> alt-0147/8, according to Brian's kind insight nearby, and it
    >> printed fine in both Arial and in Verdana, my default TB and FF
    >> font.
    >>
    >> Blowing it up, I see that it's simply typographer's quotes.
    >>
    >> So I continue to have no idea why I'm getting this error.

    >
    > (Decimal) 0147 and 0148 are not valid ISO-8859-1/Unicode code points . . .
    > but they are valid CP-1252 code points. . . .
    >
    > I have UTF-8 set for out-going e-mail and ISO-885-1 set for in-coming mail .
    > . . but I don't have Apply the default character encoding to all incoming
    > messages selected; nor do I have Use the default character encoding in
    > replies selected . . . and I've not been having any problems . . . FWIW. . . .
    >
    > /b.



    I also have neither of those options checked. Have you checked to
    see what actually happens when you send out typographer's quotes?

    Interesting that utf-8 either doesn't recognize them or does
    under a different code number. And I can't be the only person who
    has this problem in FF's display.

    p.
    --
    Mail: Thunderbird 1.5.0.9
    News: Dialog 2.0.15.1 (beta 38)
    OS: Win XP-H sp2

  8. Re: Charset problem again

    On 2007-01-24 05:14 (-0700 UTC), Paul_B wrote:

    > On Tue, 23 Jan 2007 20:55:35 -0700, Brian Heinrich wrote:




    >> (Decimal) 0147 and 0148 are not valid ISO-8859-1/Unicode code points . . .
    >> but they are valid CP-1252 code points. . . .
    >>
    >> I have UTF-8 set for out-going e-mail and ISO-885-1 set for in-coming mail .
    >> . . but I don't have Apply the default character encoding to all incoming
    >> messages selected; nor do I have Use the default character encoding in
    >> replies selected . . . and I've not been having any problems . . . FWIW. . . .

    >
    > I also have neither of those options checked. Have you checked to
    > see what actually happens when you send out typographer's quotes?


    They get sent out as typographer's quotes. :-) If, under Windows, I enter
    the CP-1252 code points, they're 'translated' by Moz/SM/Tb into the proper
    HTML decimal entities when I send out the e-mail -- /i.e./, if I enter
    + 0146 on the keyboard, ’ is what is sent out.

    > Interesting that utf-8 either doesn't recognize them or does
    > under a different code number.


    CP-1252 is a non-standard code page;
    gives
    an indication of why CP-1252 differs from ISO-8859-1.

    If you look at the table at ,
    you'll see that CP-1252 has characters in the range of 128-159 (80-9F),
    which are reserved for control characters in ISO-8859-1 (and consequently
    UTF-8).

    > And I can't be the only person who
    > has this problem in FF's display.


    The only issues I've ever really noted are with IPA extensions on Wikipedia.

    One issue that you /might/ experience is that if you have, say, a closing
    inverted comma, |’|, inserted as such -- you can see this if you try to
    validate a page like this: the W3C Validator will throw an error (on the
    line before the line on which the character appears) indicating that it's
    not a valid (/e.g./) UTF-8 character (most modern pages use UTF-8).

    Don't know if this helps clarify anything. . . .

    /b.

    --
    People are stupid. /A/ person may be smart, but /people/ are stupid.
    --Stephen M. Graham

  9. Re: Charset problem again

    Paul_B wrote:
    > On Tue, 23 Jan 2007 20:55:35 -0700, Brian Heinrich wrote:
    >
    >> On 2007-01-23 20:11 (-0700 UTC), Paul_B wrote:
    >>
    >>> On Tue, 23 Jan 2007 19:27:53 -0500, Ron K. wrote:
    >>>
    >>>> Tbird Leader Paul_B radioed the tower, On 1/23/2007 7:14 PM:
    >>>>> Hi,
    >>>>>
    >>>>> I thought my problem with utf-8 warning popups on outgoing mail
    >>>>> had been solved by the addition of a charset fallback rule, but
    >>>>> suddenly I am getting them again.
    >>>>>
    >>>>> To review, I am set to send in the ISO-8859-15 charset and the
    >>>>> Verdana font. Thanks to help received here, I added to my
    >>>>> about:config the following two rules:
    >>>>>
    >>>>>
    >>>>> intl.fallbackCharsetList.ISO-8859-1 / user set / string / utf-8
    >>>>>
    >>>>> intl.fallbackCharsetList.ISO-8859-15 / user set / string / utf-8
    >>>>>
    >>>>>
    >>>>> Now, as an example, I just tried to send the following text in an
    >>>>> html mail:
    >>>>>
    >>>>> �democracy�
    >>>>>
    >>>>> That's copied from a Web page which FF is set to read in utf-8,
    >>>>> and apparently the page is written in "font-family: Arial,
    >>>>> Helvetica, sans-serif". I don't think the problematic characters
    >>>>> print accurately here, but it looks like a black diamond with a
    >>>>> white question mark in it. I've been getting *a lot* of that
    >>>>> symbol lately in FF2, using utf-8 as my default charset, a
    >>>>> charset which is supposed to read anything.
    >>>>>
    >>>>> In FF, when I switch to good old Windows-1252, the text in
    >>>>> question simply reads
    >>>>>
    >>>>> “democracy”
    >>>>>
    >>>>> which seems to be printing here clearly.
    >>>>>
    >>>>> So I have two questions. First, generally, why won't utf-8 read a
    >>>>> very common character? Even when I switch (on the fly) my
    >>>>> outgoing mail to utf-8 the character is not read correctly.
    >>>>> Second, if I have utf-8 set as a fallback encoding, why is TB
    >>>>> showing me the encoding warning on outgoing mail? I thought I had
    >>>>> that problem solved.
    >>>>>
    >>>>> Thanks,
    >>>>> p.
    >>>>>
    >>>> Bare in mind the font used for display must be a UNICODE supporting font
    >>>> and the OS must also be UNICODE complaint. Windows 98SE and ME are only
    >>>> partially supportive of UNICODE. Arial font file needs to indicate a
    >>>> font version of 2.7x for support of more than Windows-1252 character set.
    >>> Ok. I'm on Win XP. The Arial is version 3.0. In Word I typed
    >>> alt-0147/8, according to Brian's kind insight nearby, and it
    >>> printed fine in both Arial and in Verdana, my default TB and FF
    >>> font.
    >>>
    >>> Blowing it up, I see that it's simply typographer's quotes.
    >>>
    >>> So I continue to have no idea why I'm getting this error.

    >> (Decimal) 0147 and 0148 are not valid ISO-8859-1/Unicode code points . . .
    >> but they are valid CP-1252 code points. . . .
    >>
    >> I have UTF-8 set for out-going e-mail and ISO-885-1 set for in-coming mail .
    >> . . but I don't have Apply the default character encoding to all incoming
    >> messages selected; nor do I have Use the default character encoding in
    >> replies selected . . . and I've not been having any problems . . . FWIW. . . .
    >>
    >> /b.

    >
    >
    > I also have neither of those options checked. Have you checked to
    > see what actually happens when you send out typographer's quotes?
    >
    > Interesting that utf-8 either doesn't recognize them or does
    > under a different code number. And I can't be the only person who
    > has this problem in FF's display.
    >
    > p.


    It does, but they are multibyte characters. I don't know (off the top of my
    head) which codepoints they are, but if you're interested you can find it out
    by going to http://www.unicode.org/charts/ which is one of two HTML pages of
    "table of contents" of the actual charts. (The other one is linked at the top,
    under "Symbols and Punctuation" IIRC, so maybe that other one is the one you
    need.)

    Warning: The "tables of contents" are in HTML but the charts proper are in
    PDF. You'll need a PDF reader (or a PDF plugin) to display them.


    Best regards,
    Tony.

  10. Re: Charset problem again

    Tbird Leader Brian Heinrich radioed the tower, On 1/24/2007 1:14 PM:
    > On 2007-01-24 05:14 (-0700 UTC), Paul_B wrote:
    >
    >> On Tue, 23 Jan 2007 20:55:35 -0700, Brian Heinrich wrote:

    >
    >
    >
    >>> (Decimal) 0147 and 0148 are not valid ISO-8859-1/Unicode code points
    >>> . . . but they are valid CP-1252 code points. . . .
    >>>
    >>> I have UTF-8 set for out-going e-mail and ISO-885-1 set for
    >>> in-coming mail . . . but I don't have Apply the default character
    >>> encoding to all incoming messages selected; nor do I have Use the
    >>> default character encoding in replies selected . . . and I've not
    >>> been having any problems . . . FWIW. . . .

    >>
    >> I also have neither of those options checked. Have you checked to see
    >> what actually happens when you send out typographer's quotes?

    >
    > They get sent out as typographer's quotes. :-) If, under Windows, I
    > enter the CP-1252 code points, they're 'translated' by Moz/SM/Tb into
    > the proper HTML decimal entities when I send out the e-mail -- /i.e./,
    > if I enter + 0146 on the keyboard, ’ is what is sent out.
    >
    >> Interesting that utf-8 either doesn't recognize them or does under a
    >> different code number.

    >
    > CP-1252 is a non-standard code page;
    >
    > gives an indication of why CP-1252 differs from ISO-8859-1.
    >
    > If you look at the table at
    > , you'll see that CP-1252
    > has characters in the range of 128-159 (80-9F), which are reserved for
    > control characters in ISO-8859-1 (and consequently UTF-8).
    >
    >> And I can't be the only person who has this problem in FF's display.

    >
    > The only issues I've ever really noted are with IPA extensions on
    > Wikipedia.
    >
    > One issue that you /might/ experience is that if you have, say, a
    > closing inverted comma, |’|, inserted as such -- you can see this if
    > you try to validate a page like this: the W3C Validator will throw an
    > error (on the line before the line on which the character appears)
    > indicating that it's not a valid (/e.g./) UTF-8 character (most modern
    > pages use UTF-8).
    >
    > Don't know if this helps clarify anything. . . .
    >
    > /b.
    >

    I suspect MS created the CP-1252, etc. to get additional characters for
    MS Word, etc. as the older DOS CP-437, etc. were not adaptable enough.
    The original True Type (TT) spec v-1.0 were more DOS related, however
    the TT v-2.0 spec began the migration toward UNICODE and the CP-1252
    sort of character sets. Windows NT 2000 was the first version of Windows
    to be fully UNICODE complaint.

    There are lots of cases when Windows XP will refuse to use a TT font
    because the font spec is now adhered to on matters of all internal
    tables being present and having valid flags and values.

    --
    Ron K.
    Don't be a fonted, it's just type casting

  11. Re: Charset problem again

    Paul_B wrote:
    > On Tue, 23 Jan 2007 18:30:19 -0700, Brian Heinrich wrote:
    >
    >> On 2007-01-23 17:14 (-0700 UTC), Paul_B wrote:
    >>
    >>> [snip]
    >>>
    >>> Now, as an example, I just tried to send the following text in an
    >>> html mail:
    >>>
    >>> �democracy�
    >>>
    >>> That's copied from a Web page which FF is set to read in utf-8,
    >>> and apparently the page is written in "font-family: Arial,
    >>> Helvetica, sans-serif". I don't think the problematic characters
    >>> print accurately here, but it looks like a black diamond with a
    >>> white question mark in it. I've been getting *a lot* of that
    >>> symbol lately in FF2, using utf-8 as my default charset, a
    >>> charset which is supposed to read anything.
    >>>
    >>> In FF, when I switch to good old Windows-1252, the text in
    >>> question simply reads
    >>>
    >>> “democracy”
    >>>
    >>> which seems to be printing here clearly.
    >>>
    >>> So I have two questions. First, generally, why won't utf-8 read a
    >>> very common character? Even when I switch (on the fly) my
    >>> outgoing mail to utf-8 the character is not read correctly.
    >>> Second, if I have utf-8 set as a fallback encoding, why is TB
    >>> showing me the encoding warning on outgoing mail? I thought I had
    >>> that problem solved.

    >> Ron K. may have hit the nail on the head: What OS are you on?
    >>
    >> BTW, I suspect the whole thing about Win-1252 is a bit of a red herring,
    >> tho' I'm not sure why you keep encountering it. The positions for pening
    >> and closing double quotations in CP-1252 (/e.g./, + 0147 and +
    >> 0148) are actually control characters in ISO-8859-1 (and UTF-8).

    >
    >
    > Brian, the only reason I keep bringing up Win-1252 is that I
    > don't have any of these problems when I use it.


    "When [you] use it" meaning "when you set that charset in Firefox,"
    right? The problem is not what's happening in Thunderbird, exactly: the
    problem is that the web page is either misreporting itself as, or being
    forced by you into, UTF-8. Since the page is actually Win1252, any
    character that's not in the plain ASCII range is going to be
    misinterpreted: even characters that share the numerical value between
    the two charsets are encoded with different byte sequences.

    So, Firefox's internal representation of that text (as UTF-8) is
    democracy
    and when you copy the text, that's what gets copied over -- it doesn't
    copy the source bytes into the clipboard, it copies the characters it
    thinks it has. Pasting that into TB is not going to fix the problem.

  12. Re: Charset problem again

    On Wed, 24 Jan 2007 11:14:36 -0700, Brian Heinrich wrote:

    > On 2007-01-24 05:14 (-0700 UTC), Paul_B wrote:
    >
    >> On Tue, 23 Jan 2007 20:55:35 -0700, Brian Heinrich wrote:

    >
    >
    >
    >>> (Decimal) 0147 and 0148 are not valid ISO-8859-1/Unicode code points . . .
    >>> but they are valid CP-1252 code points. . . .
    >>>
    >>> I have UTF-8 set for out-going e-mail and ISO-885-1 set for in-coming mail .
    >>> . . but I don't have Apply the default character encoding to all incoming
    >>> messages selected; nor do I have Use the default character encoding in
    >>> replies selected . . . and I've not been having any problems . . . FWIW. . . .

    >>
    >> I also have neither of those options checked. Have you checked to
    >> see what actually happens when you send out typographer's quotes?

    >
    > They get sent out as typographer's quotes. :-) If, under Windows, I enter
    > the CP-1252 code points, they're 'translated' by Moz/SM/Tb into the proper
    > HTML decimal entities when I send out the e-mail -- /i.e./, if I enter
    > + 0146 on the keyboard, ’ is what is sent out.
    >
    >> Interesting that utf-8 either doesn't recognize them or does
    >> under a different code number.

    >
    > CP-1252 is a non-standard code page;
    > gives
    > an indication of why CP-1252 differs from ISO-8859-1.
    >
    > If you look at the table at ,
    > you'll see that CP-1252 has characters in the range of 128-159 (80-9F),
    > which are reserved for control characters in ISO-8859-1 (and consequently
    > UTF-8).
    >
    >> And I can't be the only person who
    >> has this problem in FF's display.

    >
    > The only issues I've ever really noted are with IPA extensions on Wikipedia.
    >
    > One issue that you /might/ experience is that if you have, say, a closing
    > inverted comma, ||, inserted as such -- you can see this if you try to
    > validate a page like this: the W3C Validator will throw an error (on the
    > line before the line on which the character appears) indicating that it's
    > not a valid (/e.g./) UTF-8 character (most modern pages use UTF-8).
    >
    > Don't know if this helps clarify anything. . . .
    >
    > /b.


    Gadzooks. Not sure I can wrap my brain around all that. I might
    switch to utf-8 on the outgo and hopefully be done with the whole
    thing.

    Thanks to all. Will update when I figure out what to do.
    p.

    --
    Mail: Thunderbird 1.5.0.9
    News: Dialog 2.0.15.1 (beta 38)
    OS: Win XP-H sp2

  13. Re: Charset problem again

    On Wed, 24 Jan 2007 18:15:45 -0800, Mike Cowperthwaite wrote:

    > Paul_B wrote:
    >> On Tue, 23 Jan 2007 18:30:19 -0700, Brian Heinrich wrote:
    >>
    >>> On 2007-01-23 17:14 (-0700 UTC), Paul_B wrote:
    >>>
    >>>> [snip]
    >>>>
    >>>> Now, as an example, I just tried to send the following text in an
    >>>> html mail:
    >>>>
    >>>> �democracy�
    >>>>
    >>>> That's copied from a Web page which FF is set to read in utf-8,
    >>>> and apparently the page is written in "font-family: Arial,
    >>>> Helvetica, sans-serif". I don't think the problematic characters
    >>>> print accurately here, but it looks like a black diamond with a
    >>>> white question mark in it. I've been getting *a lot* of that
    >>>> symbol lately in FF2, using utf-8 as my default charset, a
    >>>> charset which is supposed to read anything.
    >>>>
    >>>> In FF, when I switch to good old Windows-1252, the text in
    >>>> question simply reads
    >>>>
    >>>> “democracy”
    >>>>
    >>>> which seems to be printing here clearly.
    >>>>
    >>>> So I have two questions. First, generally, why won't utf-8 read a
    >>>> very common character? Even when I switch (on the fly) my
    >>>> outgoing mail to utf-8 the character is not read correctly.
    >>>> Second, if I have utf-8 set as a fallback encoding, why is TB
    >>>> showing me the encoding warning on outgoing mail? I thought I had
    >>>> that problem solved.
    >>> Ron K. may have hit the nail on the head: What OS are you on?
    >>>
    >>> BTW, I suspect the whole thing about Win-1252 is a bit of a red herring,
    >>> tho' I'm not sure why you keep encountering it. The positions for pening
    >>> and closing double quotations in CP-1252 (/e.g./, + 0147 and +
    >>> 0148) are actually control characters in ISO-8859-1 (and UTF-8).

    >>
    >> Brian, the only reason I keep bringing up Win-1252 is that I
    >> don't have any of these problems when I use it.

    >
    > "When [you] use it" meaning "when you set that charset in Firefox,"
    > right? The problem is not what's happening in Thunderbird, exactly: the
    > problem is that the web page is either misreporting itself as, or being
    > forced by you into, UTF-8. Since the page is actually Win1252, any
    > character that's not in the plain ASCII range is going to be
    > misinterpreted: even characters that share the numerical value between
    > the two charsets are encoded with different byte sequences.
    >
    > So, Firefox's internal representation of that text (as UTF-8) is
    > democracy
    > and when you copy the text, that's what gets copied over -- it doesn't
    > copy the source bytes into the clipboard, it copies the characters it
    > thinks it has. Pasting that into TB is not going to fix the problem.



    Hmm. There are two aspects to me "using it". First, If I switch
    to win-1252 via FF's View menu, the result is almost always a
    perfect rendition.

    But also - and this could be where I'm getting into trouble - I
    wrote a rule for Proxomitron to change the encoding spec in page
    headers to utf-8. I assumed that doing that would tell FF to read
    it in that encoding. Maybe I was introducing translation error?

    ....

    I believe this is at least part of the problem. I just posted
    typographer's quotes to that same forum where I had the problem,
    and they printed fine. The encoding was ISO-8859-1, and my
    default encoding I switched to utf-8. Maybe my trying to force a
    solution was actually causing a worse problem.

    I'll monitor this a bit and see how it fares.

    Thanks,
    p.
    --
    Mail: Thunderbird 1.5.0.9
    News: Dialog 2.0.15.1 (beta 38)
    OS: Win XP-H sp2

  14. Re: Charset problem again

    On Wed, 24 Jan 2007 21:44:50 -0500, Paul_B wrote:

    > On Wed, 24 Jan 2007 11:14:36 -0700, Brian Heinrich wrote:
    >
    >> On 2007-01-24 05:14 (-0700 UTC), Paul_B wrote:
    >>
    >>> On Tue, 23 Jan 2007 20:55:35 -0700, Brian Heinrich wrote:

    >>
    >>
    >>
    >>>> (Decimal) 0147 and 0148 are not valid ISO-8859-1/Unicode code points . . .
    >>>> but they are valid CP-1252 code points. . . .
    >>>>
    >>>> I have UTF-8 set for out-going e-mail and ISO-885-1 set for in-coming mail .
    >>>> . . but I don't have Apply the default character encoding to all incoming
    >>>> messages selected; nor do I have Use the default character encoding in
    >>>> replies selected . . . and I've not been having any problems . . . FWIW. . . .
    >>>
    >>> I also have neither of those options checked. Have you checked to
    >>> see what actually happens when you send out typographer's quotes?

    >>
    >> They get sent out as typographer's quotes. :-) If, under Windows, I enter
    >> the CP-1252 code points, they're 'translated' by Moz/SM/Tb into the proper
    >> HTML decimal entities when I send out the e-mail -- /i.e./, if I enter
    >> + 0146 on the keyboard, ’ is what is sent out.
    >>
    >>> Interesting that utf-8 either doesn't recognize them or does
    >>> under a different code number.

    >>
    >> CP-1252 is a non-standard code page;
    >> gives
    >> an indication of why CP-1252 differs from ISO-8859-1.
    >>
    >> If you look at the table at ,
    >> you'll see that CP-1252 has characters in the range of 128-159 (80-9F),
    >> which are reserved for control characters in ISO-8859-1 (and consequently
    >> UTF-8).
    >>
    >>> And I can't be the only person who
    >>> has this problem in FF's display.

    >>
    >> The only issues I've ever really noted are with IPA extensions on Wikipedia.
    >>
    >> One issue that you /might/ experience is that if you have, say, a closing
    >> inverted comma, ||, inserted as such -- you can see this if you try to
    >> validate a page like this: the W3C Validator will throw an error (on the
    >> line before the line on which the character appears) indicating that it's
    >> not a valid (/e.g./) UTF-8 character (most modern pages use UTF-8).
    >>
    >> Don't know if this helps clarify anything. . . .
    >>
    >> /b.

    >
    > Gadzooks. Not sure I can wrap my brain around all that. I might
    > switch to utf-8 on the outgo and hopefully be done with the whole
    > thing.
    >
    > Thanks to all. Will update when I figure out what to do.
    > p.


    Ok, here's what I've done. I set utf-8 as my default charset. I
    also set to use it for all replies.

    I then copied plain text from a distribution email, clicked
    Forward, and overwrote all the html graphics stuff with my plain
    copied text. When I went to send the mail the charset warning
    came up!

    I then checked the operative charset for that mail, and none was
    selected. So I guess TB doesn't apply the default charset to
    Forwards, even when the "use default charset for Replies"
    function is enabled. I guess that makes sense, but it creates a
    slight nuisance on this end.

    Anyway, I think I have solved most of this problem. Thanks again
    to all.

    p.
    --
    Mail: Thunderbird 1.5.0.9
    News: Dialog 2.0.15.1 (beta 38)
    OS: Win XP-H sp2

+ Reply to Thread