[9fans] simplicity - Plan9

This is a discussion on [9fans] simplicity - Plan9 ; Uriel wrote: > found this gem in one of the many X headers: > #define NBBY 8 /* number of bits in a byte */ So what is supposed to be wrong with using a manifest constant instead of hard-coding ...

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2
Results 21 to 34 of 34

Thread: [9fans] simplicity

  1. Re: [9fans] simplicity

    Uriel wrote:
    > found this gem in one of the many X headers:
    > #define NBBY 8 /* number of bits in a byte */


    So what is supposed to be wrong with using a manifest constant
    instead of hard-coding "8" in various places? As I recall,
    The Elements of Programming Style recommended this approach.

    Similar definitions have been in Unix system headers for
    decades. CHAR_BIT is defined in . (Yes, I know
    there is a difference between a char and a byte. Less well
    known, there is a difference between a byte and an octet.)

    I'm not saying that some of the complaints don't have a
    point, especially when important tools perform poorly.
    However, I've observed an unusal degree of arrogance in
    the Plan 9 newsgroup, approaching religion. Plan 9's way
    of doing things is not the only intelligent way; others
    may have different goals and constraints that affect how
    they do things in their particular environments.

  2. Re: [9fans] simplicity

    > So what is supposed to be wrong with using a manifest constant
    > instead of hard-coding "8" in various places? As I recall,
    > The Elements of Programming Style recommended this approach.


    i see two problems with this sort of indirection. if i see NBBY
    in the code, i have to look up it's value. NBBY doesn't mean anything
    to me. this layer of mental gymnastics that makes the code hard
    to read and understand. on the other hand, 8 means something to me.

    more importantly, it implies that the code would work with NBBY
    of 10 or 12. (c standard says you can't have < 8 則5.2.4.2.1.)
    i'd bet there are many things in the code that depend on the sizeof
    a byte that don't reference NBBY.

    so this define goes 0 fer 2. it can't be changed and it is not informative.

    > Similar definitions have been in Unix system headers for
    > decades. CHAR_BIT is defined in . (Yes, I know
    > there is a difference between a char and a byte. Less well
    > known, there is a difference between a byte and an octet.)


    this mightn't be the right place to defend a practice by saying that
    "unix systems have been doing it for years."

    - erik


  3. Re: [9fans] simplicity

    >Less well known, there is a difference between a byte and an octet.

    grep octet /sys/games/lib/fortunes
    20 octets is 160 guys playing flutes -- rob

    easily one of my favourites


  4. Re: [9fans] simplicity

    On 9/19/07, Douglas A. Gwyn wrote:
    > I'm not saying that some of the complaints don't have a
    > point, especially when important tools perform poorly.
    > However, I've observed an unusal degree of arrogance in
    > the Plan 9 newsgroup, approaching religion. Plan 9's way
    > of doing things is not the only intelligent way; others
    > may have different goals and constraints that affect how
    > they do things in their particular environments.
    >


    imho a big problem is that in the mentioned places every environment
    is always thought as a particular one.

    iru

  5. Re: [9fans] simplicity

    > i see two problems with this sort of indirection. if i see NBBY
    > in the code, i have to look up it's value. NBBY doesn't mean anything
    > to me. this layer of mental gymnastics that makes the code hard
    > to read and understand. on the other hand, 8 means something to me.
    >
    > more importantly, it implies that the code would work with NBBY
    > of 10 or 12. (c standard says you can't have < 8 則5.2.4.2.1.)
    > i'd bet there are many things in the code that depend on the sizeof
    > a byte that don't reference NBBY.
    >
    > so this define goes 0 fer 2. it can't be changed and it is not informative.


    8 can be a lot of things besides the number of bits in a byte
    (the number of bytes in a double or vlong, for example).
    if you're doing enough conversions between byte counts
    and bit counts, then using NBBY makes it clear *why* you're
    using an 8 there, which might help a lot.

    in other contexts, it might not be worth the effort.

    jumping all over a #define without seeing how or
    why it is being used is not productive. nor interesting.
    in fact i can't believe i'm writing this. sorry.

    russ


  6. Re: [9fans] simplicity

    > However, I've observed an unusal degree of arrogance in
    > the Plan 9 newsgroup, approaching religion.


    elitism, not arrogance.

    "I don't want to belong to any club that will accept me as a member." - Groucho Marx


  7. Re: [9fans] simplicity

    In article <5d375e920709180838t4070c23al11bc0eb5cc7280c9@mail. gmail.com> Uriel wrote:
    >Don't complain, at least it is not producing random behaviour, I have
    >seen versions of gnu awk that when feed plain ASCII input, if the
    >locale was UTF-8, rules would match random lines of input, the fix?
    >set the locale to 'C' at the top of all your scripts (and don't even
    >think of dealing with files which actually contain non-ASCII UTF-8).
    >
    >This was some years ago, it might be fixed by now, but it demonstrates
    >how the locale insanity makes life so much more fun.


    It likely is fixed by now. If not, I'd like to have a sample program and
    data and locale name to test under. And the truth is, even if it doesn't work,
    I can blame the library routines and locale and not my code. :-)

    Testing should be performed using current sources, available via anonymous
    CVS from savannah.gnu.org, check out the gawk-stable module. From CVS use:

    ./bootstrap.sh
    ./configure && make && make check

    to build on a Unix or Linux system.

    I hope to make a formal release in the next few weeks.

    As to the original thread, yeah, configure (= autoconf + automake +
    libtool + gnulib) has gotten way too hairy to handle. I don't use gnulib
    on principle: I have the gut feeling that the configuration goop would
    likely outweigh the source code in line count.

    The only reason I added Automake support was to get GNU Gettext, which
    on balance is a good thing. Locales, on the other hand, I think are
    very painful. I hope that people who use them find them valuable (I'm
    a parochial English speaking American myself, so ASCII is usually
    enough for me.)

    My two cents,

    Arnold
    --
    Aharon (Arnold) Robbins arnold AT skeeve DOT com
    P.O. Box 354 Home Phone: +972 8 979-0381 Fax: +1 206 202 4333
    Nof Ayalon Cell Phone: +972 50 729-7545
    D.N. Shimshon 99785 ISRAEL

  8. Re: [9fans] simplicity

    > >This was some years ago, it might be fixed by now, but it demonstrates
    > >how the locale insanity makes life so much more fun.

    >
    > It likely is fixed by now. If not, I'd like to have a sample program and
    > data and locale name to test under. And the truth is, even if it doesn't work,
    > I can blame the library routines and locale and not my code. :-)


    Yes, it is likely fixed now, and it was very likely a bug in the
    libraries rather than awk, but illustrates the kinds of problems
    locales create. And I can tell you, in a production environment it can
    be a pain when who knows what tool who knows where in your whole
    system starts to misbehave because it is not happy with your locale.

    I also find most sad how in the name of 'localization' the output of
    many tools (specially error messages) has become unpredictable. It
    makes providing support most fun when you ask people "can you copy
    paste the output you get when you run this", and they answer with a
    bunch of stuff Aramaic. If you use unix, you are supposed to
    understand English, period. (Or what is next? will they have a set of
    'magic symlinks' that links '/bin/gato' to '/bin/cat' if your locale
    is in Spanish?)

    And now that you mention Gettext, if only I could get back all the
    time I wasted trying to compile some stupid program (that should never
    have been 'localized' in the first place) which is somehow unhappy
    about the gettext version I have (or the other way around)...

    uriel

    P.S.: Oh, and people who insist in using encodings other than UTF-8
    should be locked up in padded cells (without access to computers and
    ideally even without electricity, unless it is to help them
    electrocute themselves) for the good of mankind.

  9. Re: [9fans] simplicity

    Yes, old thread, sorry. Blame Uriel.

    On 9/18/07, Douglas A. Gwyn wrote:
    > erik quanstrom wrote:
    > > suppose Linux user a and user b grep the same "text" file for the same string.
    > > results will depend on the users' locales.

    >
    > But if they're trying to match an alphabetic character class, the
    > result *should* depend on the locale.


    This baffles me. Can anyone think of examples where one might want
    differing results depending on your locale?

    -Jack

  10. Re: [9fans] simplicity

    > Yes, old thread, sorry. Blame Uriel.
    >
    > On 9/18/07, Douglas A. Gwyn wrote:
    > > erik quanstrom wrote:
    > > > suppose Linux user a and user b grep the same "text" file for the same string.
    > > > results will depend on the users' locales.

    > >
    > > But if they're trying to match an alphabetic character class, the
    > > result *should* depend on the locale.

    >
    > This baffles me. Can anyone think of examples where one might want
    > differing results depending on your locale?
    >
    > -Jack


    i think i see what the reasoning is. the thought is that, e.g.,
    in spanish [a-z] should match 単.

    the problem is this means that grep(regexp, data) now
    returns a set of results, one for each locale.

    so on the one hand, one would like [a-z] to do the Right Thing,
    depending on language. and on the other hand, one wants
    grep(regexp, data) to return a single result.

    i think the way to see through this issue is to notice that
    the reason we want 単 to be in [a-z] is because of visual
    similarity. what if we were dealing with chinese? i think
    it's pretty clear that [a-z] should map to a contiguous set
    of unicode codepoints.

    if you want to deal with 単, the unicode tables do note that 単
    is n+combining ~, so one could come up with a new
    denotation for base codepoint. unfortunately the combining
    that with existing regexp would be a bit painful.

    - erik

  11. Re: [9fans] simplicity

    On 9/18/07, Uriel wrote:
    > Don't complain, at least it is not producing random behaviour, I have
    > seen versions of gnu awk that when feed plain ASCII input, if the
    > locale was UTF-8, rules would match random lines of input, the fix?
    > set the locale to 'C' at the top of all your scripts (and don't even
    > think of dealing with files which actually contain non-ASCII UTF-8).
    >
    > This was some years ago, it might be fixed by now, but it demonstrates
    > how the locale insanity makes life so much more fun.-


    Heh, funny that this thread got revived the very day that my
    colleague's backup script choked because he was running in a utf8
    locale and hit a filename encoded in iso8859-1. Apparently GNU sed's .
    stops matching when it hits an invalid bytestream (which is not
    entirely unreasonable I guess).
    -sqweek

  12. Re: [9fans] simplicity

    On 10/9/07, erik quanstrom wrote:
    > i think i see what the reasoning is. the thought is that, e.g.,
    > in spanish [a-z] should match 単.


    Ah, thanks!

    I was thinking of the simplistic scenario, where someone might be
    looking for ni単o in some file, regardless of what locale they might
    happen to be in. Now I can imagine the nightmare it must be for
    non-English speakers looking for letter combinations irrespective of
    accents.

    But, it seems more like a problem with the shorthand than grep, per
    se. I could see an argument for [:alpha:] potentially matching n and
    単 depending on the locale, but [a-z] not matching 単 in any locale. But
    even that, my tendency would be that [:alpha:] match 単 in every
    locale.

    But then, does [:alpha:] match 畆粒留慮凌? How ironic that it doesn't match 留.

    What an ugly problem.

    -Jack

  13. Re: [9fans] simplicity

    > Heh, funny that this thread got revived the very day that my
    > colleague's backup script choked because he was running in a utf8
    > locale and hit a filename encoded in iso8859-1. Apparently GNU sed's .
    > stops matching when it hits an invalid bytestream (which is not
    > entirely unreasonable I guess).
    > -sqweek


    clearly in their world, it is unreasonable.

    - erik

  14. Re: [9fans] simplicity

    > I was thinking of the simplistic scenario, where someone might be
    > looking for ni単o in some file, regardless of what locale they might
    > happen to be in. Now I can imagine the nightmare it must be for
    > non-English speakers looking for letter combinations irrespective of
    > accents.
    >
    > But, it seems more like a problem with the shorthand than grep, per
    > se.


    i agree with this. or it's a historical problem with the character set.
    clearly if you were designing a universial character set with no compatability
    constraints, the alphabet would have n単 together so [a-z] would
    match both.

    > I could see an argument for [:alpha:] potentially matching n and
    > 単 depending on the locale, but [a-z] not matching 単 in any locale. But
    > even that, my tendency would be that [:alpha:] match 単 in every
    > locale.
    >
    > But then, does [:alpha:] match 畆粒留慮凌? How ironic that it doesn't match 留.


    i don't think one can go this route. you can't have a magic environment
    variable that changes everything. testing is a nightmare in such a world.
    you have to go through every combination of (data cs, locale) to see if
    things are working.

    a better solution is to use the properties of unicode. 単 is noted in the
    table as

    00f1;latin small letter n with tilde;ll;0;l;006e 0303;;;;n;latin small letter n tilde;;00d1;;00d1

    field 6 has the base codepoint 006e as its first subfield. it would not be hard
    to build a table quickly mapping a codepoint to its base codepoint .
    but it would probablly be most useful to also have a mapping from
    base codepoints to all composed forms 両.

    suppose, for lack of creativity, we use 損 to mean all base codepoints
    matching the next item character so 損a matches 辰 as does 損[a-z].
    so for 損 of a letter c can be grepped by taking 両(c) which results
    in a character class.

    plan 9 already has some of this in the c library with tolowerrune, etc.
    i did some work with this some time ago and wrote some rc scripts to
    generate the to*rune tables from the unicode standard data. it would
    be easy to adapt them to generate 両 and . (the tables would be pretty big.)

    >
    > What an ugly problem.


    it can be made ugly quickly. but i'm not convinced that all approaches
    to this problem are bad.

    - erik

+ Reply to Thread
Page 2 of 2 FirstFirst 1 2