Re: UTF-8 and Alt key in the console - Kernel

This is a discussion on Re: UTF-8 and Alt key in the console - Kernel ; --- "H. Peter Anvin" wrote: > John T. wrote: > > > > OK, let's see if I can answer this. > > > > Vi has 32 years of ESC key use tradition which doesn't play > > well ...

+ Reply to Thread
Results 1 to 16 of 16

Thread: Re: UTF-8 and Alt key in the console

  1. Re: UTF-8 and Alt key in the console


    --- "H. Peter Anvin" wrote:

    > John T. wrote:
    > >
    > > OK, let's see if I can answer this.
    > >
    > > Vi has 32 years of ESC key use tradition which doesn't play
    > > well with "meta sends ESC".
    > >
    > > Even though "meta sets 8th bit" is "broken" in your point-of-view,
    > > that didn't stop it from being used all these years. The fact
    > > that it maps into real characters is not a problem if you can just
    > > use a CTRL-V equivalent in bash or vim.
    > >
    > > Furthermore, it is an _option_. No one is obliged to use it.
    > > So it's a question of:
    > >
    > > .. _forcing_ the end of "meta sets 8th bit"
    > > .. leaving things the way they are, and have them keep working,
    > > as xterm did.
    > >
    > > So guess we should fix xterm too?
    > >
    > > I think you're exagerating.
    > >

    >
    > Hardly. vim clearly can deal with the ESC-is-prefix issue anyway, since
    > otherwise it wouldn't be able to use arrow keys.


    There's always the "timeout" hack. It is allright with the
    arrow and function keys because the second character in these
    cases (`[' usually) is not a commonly typed vim command.

    > That being said, quite frankly, *both* Meta key conventions are
    > incredibly broken.


    Indeed, I agree with you here.

    > What I would much prefer is to see would be a brand new convention where
    > different keys (Ctrl, Meta, Super, Hyper, Alt or even in some cases
    > Shift) issues a unique prefix which doesn't conflict with anything else.
    > Emacs has tried to promote such a convention of the format
    > @ which is a lot better, although it's a bit
    > Emacs-centric (using / ^X as the initial character is not really a
    > very good choice.)
    >
    > The best probably would be to introduce an escape code, along the lines
    > of other escape codes in the terminal interfae.


    You're right.

    Many say Unix is also broken compared to Plan 9.. sometimes it's
    too late. The real fix for this issue seems like it'd be very
    hard to accomplish. In the meantime, maybe we could do this easy
    fix. Or not. But we have a situation.

    > -hpa
    >




    __________________________________________________ __________________________________
    Be a better friend, newshound, and
    know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i...Dypao8Wcj9tAcJ

    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  2. Re: UTF-8 and Alt key in the console


    On Sunday 2008-03-23 19:13, John T. wrote:
    >> Hardly. vim clearly can deal with the ESC-is-prefix issue anyway, since
    >> otherwise it wouldn't be able to use arrow keys.

    >
    > There's always the "timeout" hack. It is allright with the
    > arrow and function keys because the second character in these
    > cases (`[' usually) is not a commonly typed vim command.
    >[...]
    >> The best probably would be to introduce an escape code, along the lines
    >> of other escape codes in the terminal interfae.

    >
    > You're right.
    >
    > Many say Unix is also broken compared to Plan 9.. sometimes it's
    > too late. The real fix for this issue seems like it'd be very
    > hard to accomplish.


    The idea of revamping the escape codes is not all that bad.

    Thanks to terminfo, this should be easy. Change vt.c,
    add corresponding terminfo entry and set TERM to something
    that has not previously existed.

    About the ESC key, I thought, would it suffice to replace its
    current output of ^[ with ^[^[?
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  3. Re: UTF-8 and Alt key in the console

    Jan Engelhardt wrote:
    >>> The best probably would be to introduce an escape code, along the lines
    >>> of other escape codes in the terminal interfae.

    >>
    >> You're right.
    >>
    >> Many say Unix is also broken compared to Plan 9.. sometimes it's
    >> too late. The real fix for this issue seems like it'd be very
    >> hard to accomplish.

    >
    > The idea of revamping the escape codes is not all that bad.
    >
    > Thanks to terminfo, this should be easy. Change vt.c,
    > add corresponding terminfo entry and set TERM to something
    > that has not previously existed.
    >
    > About the ESC key, I thought, would it suffice to replace its
    > current output of ^[ with ^[^[?


    It would be better to assign a CSI (ESC [) code to it, like other
    function keys. Unfortunately, the terminal everyone tries to emulate
    (Linux does so quite poorly due to its broken implementation of ISO
    2022, but that's less of an issue with UTF-8), VT 220, had ESC on the
    F11 key, so the CSI 2 3 ~ sequence it uses we use for the F11 key.
    Doesn't mean we can't assign another one.

    One would also like to distinguish, say, Backspace from Ctrl-H. This is
    trickier, because the termios settings don't permit compound keys. The
    most obvious way to deal with that is an escape code for Ctrl-H, but
    that has the risk of breaking a lot of other things.

    -hpa
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  4. Re: UTF-8 and Alt key in the console


    On Saturday 2008-03-29 00:26, H. Peter Anvin wrote:
    >>
    >> About the ESC key, I thought, would it suffice to replace its
    >> current output of ^[ with ^[^[?

    >
    > It would be better to assign a CSI (ESC [) code to it, like other function
    > keys. Unfortunately, the terminal everyone tries to emulate (Linux does so
    > quite poorly due to its broken implementation of ISO 2022, but that's less of
    > an issue with UTF-8), VT 220, had ESC on the F11 key, so the CSI 2 3 ~
    > sequence it uses we use for the F11 key. Doesn't mean we can't assign another
    > one.


    Even so, the linux term is the least broken one of all. I often had
    issues with remote login programs (largely Windows ones) that had a
    different idea of VTxxx whenever you wished not to have it. Despite
    TERM being vt100 and the local encoding being vt100 too, actual
    escape sequences were different from what programs in the shell
    expected. On one occassion, F keys worked, but the Ins/Home does not,
    in another it was reversed, etc. As soon as I learnt of putty a
    few years ago I was happy to have all the mess that windows ssh
    programs cause solved because it implemented the "linux" term type
    and that just seemed to work out-of-the-box. So it does not seem
    as broken to me as VTxxx.

    > One would also like to distinguish, say, Backspace from Ctrl-H. This is
    > trickier, because the termios settings don't permit compound keys. The most
    > obvious way to deal with that is an escape code for Ctrl-H, but that has the
    > risk of breaking a lot of other things.


    Like what? I know that ^H is abused for screen effects.. not much
    you can do about it, but it is not that important anyway.

    As for ^H, all that I think is needed is the generation of an
    appropriate escape code for Ctrl-H and Backspace at the terminal
    emulator level (read: a pure xterm thing what key gets translated
    into what escape code), while the read side then interprets
    "ESC CTRLH", "ESC BKSP" and the traditional "^H".

    And while we are at it, I'd suggest a whole new set of escape
    codes, the current sequences are particularly... bad for
    stream synchronization. Right now one has to parse strings for
    end-of-escape.. which is awkward. I'd just be able to
    strchr(s, '^]') for example and know when the escape code
    ends. (Compat should of course be honored where necessary.)
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  5. Re: UTF-8 and Alt key in the console

    Jan Engelhardt wrote:
    > And while we are at it, I'd suggest a whole new set of escape
    > codes, the current sequences are particularly... bad for
    > stream synchronization. Right now one has to parse strings for
    > end-of-escape.. which is awkward. I'd just be able to
    > strchr(s, '^]') for example and know when the escape code
    > ends. (Compat should of course be honored where necessary.)


    I think it would be a major lose to move away from ISO 6429 format; the
    format is self-terminating and really isn't all that complex.

    -hpa
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  6. Re: UTF-8 and Alt key in the console


    On Saturday 2008-03-29 01:23, H. Peter Anvin wrote:
    >> And while we are at it, I'd suggest a whole new set of escape
    >> codes, the current sequences are particularly... bad for
    >> stream synchronization. Right now one has to parse strings for
    >> end-of-escape.. which is awkward. I'd just be able to
    >> strchr(s, '^]') for example and know when the escape code
    >> ends. (Compat should of course be honored where necessary.)

    >
    > I think it would be a major lose to move away from ISO 6429 format; the format
    > is self-terminating and really isn't all that complex.


    What do you mean by self-terminating? There is no easy
    synchronization like in UTF-8, given you are anywhere inside
    a text stream, how do you know (a) you are already in an
    escape sequence and (b) how to figure out the rebegin of
    normal text.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  7. Re: UTF-8 and Alt key in the console

    Jan Engelhardt wrote:
    >>
    >> I think it would be a major lose to move away from ISO 6429 format;
    >> the format is self-terminating and really isn't all that complex.

    >
    > What do you mean by self-terminating? There is no easy
    > synchronization like in UTF-8, given you are anywhere inside
    > a text stream, how do you know (a) you are already in an
    > escape sequence and (b) how to figure out the rebegin of
    > normal text.


    (a) isn't readily supported (other than scanning backwards), but (b) is
    pretty easy, see ISO 6429.

    -hpa
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  8. Re: UTF-8 and Alt key in the console

    Jan Engelhardt wrote:
    > What do you mean by self-terminating? There is no easy
    > synchronization like in UTF-8, given you are anywhere inside
    > a text stream, how do you know (a) you are already in an
    > escape sequence and (b) how to figure out the rebegin of
    > normal text.


    It's not very useful being able to tell you are inside a escape sequence
    unless you see that sequence from the start. You do need the complete
    sequence to make sense of it.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  9. Re: UTF-8 and Alt key in the console

    David Newall wrote:
    > Jan Engelhardt wrote:
    >> What do you mean by self-terminating? There is no easy
    >> synchronization like in UTF-8, given you are anywhere inside
    >> a text stream, how do you know (a) you are already in an
    >> escape sequence and (b) how to figure out the rebegin of
    >> normal text.

    >
    > It's not very useful being able to tell you are inside a escape sequence
    > unless you see that sequence from the start. You do need the complete
    > sequence to make sense of it.


    I think what Jan is alluding to is the property of UTF-8 text that you
    can start in the middle of a string and either skip an incomplete
    character or find the beginning of it. If you can search backwards, you
    can find the beginning of an escape sequence, too; the "skip incomplete"
    functionality is missing, though, but as you say, isn't actually all
    that useful in real life *for the applications which use these kinds of
    escape sequences.*

    -hpa
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  10. Re: UTF-8 and Alt key in the console


    On Saturday 2008-03-29 18:05, H. Peter Anvin wrote:
    > David Newall wrote:
    >> Jan Engelhardt wrote:
    >> > What do you mean by self-terminating? There is no easy
    >> > synchronization like in UTF-8, given you are anywhere inside
    >> > a text stream, how do you know (a) you are already in an
    >> > escape sequence and (b) how to figure out the rebegin of
    >> > normal text.

    >>
    >> It's not very useful being able to tell you are inside a escape sequence
    >> unless you see that sequence from the start. You do need the complete
    >> sequence to make sense of it.

    >
    > I think what Jan is alluding to is the property of UTF-8 text that you can
    > start in the middle of a string and either skip an incomplete character or
    > find the beginning of it. If you can search backwards, you can find the
    > beginning of an escape sequence, too; the "skip incomplete" functionality is
    > missing, though, but as you say, isn't actually all that useful in real life
    > *for the applications which use these kinds of escape sequences.*


    No backwards searching, just forwards.

    In UTF-8 this is simple. You know you are in a character when the highest
    two bits are 10, and you can skip bytes until the start of the next
    character, whose highest bits are either 00 or 11.

    With the VTxxx escape codes, this is hardly possible. Given a broken
    code of ^[43m,

    echo -e '\x1B[43m wonderful \x1B[0m' | cosmicrays | cat

    3m wonderful ^[[0m

    There is no way to check whether you are in the escape code. And there
    is no way to find its end. If a heuristic were to be used (which is
    certainly a possibility), you would end up killing text up until the
    next ^[.

    Hence the proposal of using definite start and end markers:

    echo -e '\x1B43m\x1D wonderful \x1B0m\x1D' | cosmicrays | cat

    3m^] wonderful ^[0m^]

    Ok, finding out whether we are in an escape code is not as easy as with
    UTF-8 (the latter of which looks at the current character only), but
    still very viable.
    Prerequisite to this simple model is that the user does not use an
    overly long dumb escape sequence like ^[[43;43;43;43;43;43m, i.e.
    that the end marker is in the buffer if we really are in an escape
    sequence:

    static bool in_an_escape_seq(const char *buf)
    {
    const char *e = strchr(buf, 0x1D);
    return e != NULL && e < strchr(buf, 0x1B);
    }

    If so, skipping parts of a faulty write() is easy:

    static const char *get_out_of_esc(const char *buf)
    {
    if (in_an_escape_seq(buf))
    return strchr(buf, 0x1D) + 1;
    else
    return buf;
    }


    --
    make boldconfig -- to boldly select what no one has selected before
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  11. Re: UTF-8 and Alt key in the console

    Jan Engelhardt wrote:
    >
    > There is no way to check whether you are in the escape code. And there
    > is no way to find its end.


    Right, and wrong, respectively. Read the standard.

    -hpa
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  12. Re: UTF-8 and Alt key in the console

    Jan Engelhardt wrote:
    > Hence the proposal of using definite start and end markers:
    >
    > echo -e '\x1B43m\x1D wonderful \x1B0m\x1D' | cosmicrays | cat


    I see no merit in the idea. Most seriously, there isn't any real-world
    problem being solved. In addition, it proposes creating yet another
    type of terminal emulation. If there's something you don't like about
    VT escape codes, use a different emulation. For example, Televideo
    terminals used almost exclusively single-character control codes,
    reducing the scope of being mid-sequence to, well much closer to zero.

    You need to make quite clear that your proposal is to discontinue use of
    VT terminal emulation.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  13. Re: UTF-8 and Alt key in the console

    David Newall wrote:
    > Jan Engelhardt wrote:
    >> Hence the proposal of using definite start and end markers:
    >>
    >> echo -e '\x1B43m\x1D wonderful \x1B0m\x1D' | cosmicrays | cat

    >
    > I see no merit in the idea. Most seriously, there isn't any real-world
    > problem being solved. In addition, it proposes creating yet another
    > type of terminal emulation. If there's something you don't like about
    > VT escape codes, use a different emulation. For example, Televideo
    > terminals used almost exclusively single-character control codes,
    > reducing the scope of being mid-sequence to, well much closer to zero.
    >
    > You need to make quite clear that your proposal is to discontinue use of
    > VT terminal emulation.


    Okay, let's put this to rest once and for all:

    *** ISO 6429 sequences are self-terminating. ***

    No, you can't tell you're inside one if you miss the leading CSI, but as
    has been pointed out, there really isn't a huge case for it.

    The standard is available for free under the name ECMA-48:
    http://www.ecma-international.org/pu...T/Ecma-048.pdf

    It references ISO 2022, a.k.a. ECMA-35:
    http://www.ecma-international.org/pu...T/Ecma-035.pdf


    These standards use a decimalized hexadecimal notation, so if you see
    "05/10" it means 0x5a. A "column" refers to a 16-character set, so
    "column 4" refers to bytes 0x40 to 0x4f.


    The structure defined in section 5.4 of ISO 6429/ECMA-48:

    -----------
    5.4 Control sequences
    A control sequence is a string of bit combinations starting with the
    control function CONTROL SEQUENCE INTRODUCER (CSI) followed by one or
    more bit combinations representing parameters, if any, and by one or
    more bit combinations identifying the control function. The control
    function CSI itself is an element of the C1 set.
    The format of a control sequence is
    CSI P ... P I ... I F
    where
    a) CSI is represented by bit combinations 01/11 (representing ESC) and
    05/11 in a 7-bit code or by bit combination 09/11 in an 8-bit code, see 5.3;
    b) P ... P are Parameter Bytes, which, if present, consist of bit
    combinations from 03/00 to 03/15;
    c) I ... I are Intermediate Bytes, which, if present, consist of bit
    combinations from 02/00 to 02/15. Together with the Final Byte F, they
    identify the control function;
    NOTE The number of Intermediate Bytes is not limited by this Standard;
    in practice, one Intermediate Byte will be sufficient since with sixteen
    different bit combinations available for the Intermediate Byte over one
    thousand control functions may be identified.
    d) F is the Final Byte; it consists of a bit combination from 04/00 to
    07/14; it terminates the control sequence and together with the
    Intermediate Bytes, if present, identifies the control function. Bit
    combinations 07/00 to 07/14 are available as Final Bytes of control
    sequences for private (or experimental) use.
    -----------

    Note: DEC added nonstandard control sequences initiated with SS3 (ESC O)
    as well as CSI (ESC [); otherwise they use the same format.

    The Final Byte is easy enough to spot, as writing a generic parser which
    can pick this apart, including parameter handling.

    -hpa
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  14. Re: UTF-8 and Alt key in the console

    H. Peter Anvin wrote:

    > One would also like to distinguish, say, Backspace from Ctrl-H. This is
    > trickier, because the termios settings don't permit compound keys. The
    > most obvious way to deal with that is an escape code for Ctrl-H, but
    > that has the risk of breaking a lot of other things.


    Backspace is not a problem, since it generates ^? (DEL/127) on Linux
    since the early days.

    It would be really nice to be able get arbitrary modifier combinations for all keys
    and a separate combination for the escape key.

    Mark
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  15. Re: UTF-8 and Alt key in the console

    Marko Macek wrote:
    > H. Peter Anvin wrote:
    >
    >> One would also like to distinguish, say, Backspace from Ctrl-H. This
    >> is trickier, because the termios settings don't permit compound
    >> keys. The most obvious way to deal with that is an escape code for
    >> Ctrl-H, but that has the risk of breaking a lot of other things.

    >
    > Backspace is not a problem, since it generates ^? (DEL/127) on Linux
    > since the early days.


    And yet, Ctrl/H *is* backspace. Look it up in any ASCII chart. Let's
    not make a virtue out of ignoring or breaking standards.
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

  16. Re: UTF-8 and Alt key in the console

    Marko Macek wrote:
    > H. Peter Anvin wrote:
    >
    >> One would also like to distinguish, say, Backspace from Ctrl-H. This
    >> is trickier, because the termios settings don't permit compound keys.
    >> The most obvious way to deal with that is an escape code for Ctrl-H,
    >> but that has the risk of breaking a lot of other things.

    >
    > Backspace is not a problem, since it generates ^? (DEL/127) on Linux
    > since the early days.
    >
    > It would be really nice to be able get arbitrary modifier combinations
    > for all keys and a separate combination for the escape key.
    >


    Yes; this probably needs to be modal, but we can probably live with that.

    -hpa
    --
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/

+ Reply to Thread