Re: [9fans] simplicity - Plan9

This is a discussion on Re: [9fans] simplicity - Plan9 ; My most annoying locale problem concerned reading Czech HTML emails in mh. Don't ask why, just accept that I got a lot of these and could not simply ignore them. The problem was that mh saw a text/html MIME type ...

+ Reply to Thread
Results 1 to 6 of 6

Thread: Re: [9fans] simplicity

  1. Re: [9fans] simplicity

    My most annoying locale problem concerned reading Czech HTML emails in
    mh. Don't ask why, just accept that I got a lot of these and could not
    simply ignore them. The problem was that mh saw a text/html MIME type
    and, as it does for text, helpfully converted from the original encoding,
    usually CP1250 or iso8859-2, to the encoding specified in my locale
    environment variable, utf-8. Since the content was html, it then handed
    it to a ``browser'', in my case w3m, for pretty formatting. w3m read the
    encoding from the html header, thought its input was CP1250 or iso8859-2,
    and helpfully converted to utf-8. Both programs were behaving in a
    vaguely sensible way, but iconv was being run twice, and the result was
    gibberish. It took me a while to figure our what was happening and a
    while to figure out a way to make it stop. I don't know what the general
    answer to problems like this is. Forcing everyone to use English is not
    an option. Forcing everyone to use utf-8 would be better, but is not
    going to happen either.

    John
    --
    John Stalker
    School of Mathematics
    Trinity College Dublin
    tel +353 1 896 1983
    fax +353 1 896 2282

  2. Re: [9fans] simplicity

    > Forcing everyone to use utf-8 would be better, but is not
    > going to happen either.


    it will, it will just take some time (some things will be in utf-x for x>8)
    partly because it isn't `forced' (who could ever do the `forcing')


  3. Re: [9fans] simplicity

    > My most annoying locale problem concerned reading Czech HTML emails in
    > mh. Don't ask why, just accept that I got a lot of these and could not
    > simply ignore them. The problem was that mh saw a text/html MIME type
    > and, as it does for text, helpfully converted from the original encoding,
    > usually CP1250 or iso8859-2, [...]


    i think this is a character set conversion problem, not a locale
    problem. a small distinction, but i think one can live with converting
    character sets as they come onto a system. localized (ha!) complexity.

    - erik

  4. Re: [9fans] simplicity

    > i think this is a character set conversion problem, not a locale
    > problem. a small distinction, but i think one can live with converting
    > character sets as they come onto a system. localized (ha!) complexity.


    I'm not sure your solution is always the correct one, or is implementable.
    Should an MTA silently convert incoming mail to the local character set?
    I'm not sure I want that. The other program in my example was a web
    browser reading from a pipe. It can't know whether it's processing data
    as it comes into the system or data which is already there and has already
    been converted, unless either it can trust the meta tag in the document to
    have been updated or the conversion is pushed out into the network layer.
    Also, it's meaningful to talk about the system character set in the plan9
    world or the windows world, but not under UNIX, which is where I spend
    most of my time, for better or worse.
    --
    John Stalker
    School of Mathematics
    Trinity College Dublin
    tel +353 1 896 1983
    fax +353 1 896 2282

  5. Re: [9fans] simplicity

    On Wed Oct 10 10:05:45 EDT 2007, stalker@maths.tcd.ie wrote:
    > > i think this is a character set conversion problem, not a locale
    > > problem. a small distinction, but i think one can live with converting
    > > character sets as they come onto a system. localized (ha!) complexity.

    >
    > I'm not sure your solution is always the correct one, or is implementable.
    > Should an MTA silently convert incoming mail to the local character set?


    it doesn't have to. upas/fs does given the character set in the file.
    i've thought about the mta doing it. i think that would be a nice solution.

    > I'm not sure I want that. The other program in my example was a web
    > browser reading from a pipe. It can't know whether it's processing data
    > as it comes into the system or data which is already there and has already
    > been converted, unless either it can trust the meta tag in the document to
    > have been updated or the conversion is pushed out into the network layer.


    what is the standard. if the encoding in the header header is x does that mean
    that the encoding in the html header needs to be x? what happends if they
    differ?

    the only case that makes sense is that they have to be the same. but html
    and http generally run counter to common sense. ;-)

    - erik

  6. Re: [9fans] simplicity

    > > I'm not sure your solution is always the correct one, or is implementable.
    > > Should an MTA silently convert incoming mail to the local character set?

    >
    > it doesn't have to. upas/fs does given the character set in the file.
    > i've thought about the mta doing it. i think that would be a nice solution.


    In my case this was being done by the MUA, which was mh rather than upas,
    but the net effect is the same.

    > > I'm not sure I want that. The other program in my example was a web
    > > browser reading from a pipe. It can't know whether it's processing data
    > > as it comes into the system or data which is already there and has already
    > > been converted, unless either it can trust the meta tag in the document to
    > > have been updated or the conversion is pushed out into the network layer.

    >
    > what is the standard. if the encoding in the header header is x does that me
    > an
    > that the encoding in the html header needs to be x? what happends if they
    > differ?
    >
    > the only case that makes sense is that they have to be the same. but html
    > and http generally run counter to common sense. ;-)


    I don't know what happens if they differ. In my case they were the same, but
    the problem was that both programs assigned themselves the job of converting.
    I think that the mailer SHOULD NOT, to use the RFC capitals, convert the
    character set if it is handing off the display job to another program. In any
    case that's the way I set things up once I figured out what was going on.
    This is counter to the way the CRLF issue is handled, though. There the network
    standard is CRLF and systems which use other systems, including all the ones I use,
    are expected to convert before sending and after receiving so no local programs
    need to know about such issues.
    --
    John Stalker
    School of Mathematics
    Trinity College Dublin
    tel +353 1 896 1983
    fax +353 1 896 2282

+ Reply to Thread