flex: doubt regarding the meaning of some character classes - Unix

This is a discussion on flex: doubt regarding the meaning of some character classes - Unix ; After browsing the flex manual I've stumbled on the patterns section where it is said that patterns can be expressed with the help of character class expressions. The patterns are as follows: [:alnum:] [:alpha:] [:blank:] [:cntrl:] [:digit:] [:graph:] [:lower:] [ ...

+ Reply to Thread
Results 1 to 13 of 13

Thread: flex: doubt regarding the meaning of some character classes

  1. flex: doubt regarding the meaning of some character classes

    After browsing the flex manual I've stumbled on the patterns section
    where it is said that patterns can be expressed with the help of
    character class expressions. The patterns are as follows:

    [:alnum:] [:alpha:] [:blank:]
    [:cntrl:] [:digit:] [:graph:]
    [:lower:] [rint:] [unct:]
    [:space:] [:upper:] [:xdigit:]

    Some of those patterns appear to be straight forward but I don't have a
    clue regarding the exact meaning of a whole bunch of them. To make
    matters worse, it appears that there is absolutely no information
    covering that, not only in the flex manual but also anywhere.

    So, could anyone please post the meaning of those character classes?


    Thanks in advance
    Rui Maciel

  2. Re: flex: doubt regarding the meaning of some character classes

    On 19 Oct 2008 21:02:20 GMT,
    Rui Maciel wrote:
    > After browsing the flex manual I've stumbled on the patterns section
    > where it is said that patterns can be expressed with the help of
    > character class expressions. The patterns are as follows:
    >
    > [:alnum:] [:alpha:] [:blank:]
    > [:cntrl:] [:digit:] [:graph:]
    > [:lower:] [rint:] [unct:]
    > [:space:] [:upper:] [:xdigit:]
    >
    > Some of those patterns appear to be straight forward but I don't have a
    > clue regarding the exact meaning of a whole bunch of them. To make
    > matters worse, it appears that there is absolutely no information
    > covering that, not only in the flex manual but also anywhere.


    The POSIX character classes all correspond to functions of the same name
    with 'is' prepended. So, the character class [:isdigit:] has a
    corresponding C function isdigit(), which you can look up for the
    definition.

    Martien
    --
    |
    Martien Verbruggen |
    | Curiouser and curiouser, said Alice.
    |

  3. Re: flex: doubt regarding the meaning of some character classes

    On Oct 20, 12:37 am, Martien Verbruggen
    wrote:
    > On 19 Oct 2008 21:02:20 GMT,
    > Rui Maciel wrote:


    > > After browsing the flex manual I've stumbled on the patterns
    > > section where it is said that patterns can be expressed with
    > > the help of character class expressions. The patterns are as
    > > follows:


    > > [:alnum:] [:alpha:] [:blank:]
    > > [:cntrl:] [:digit:] [:graph:]
    > > [:lower:] [rint:] [unct:]
    > > [:space:] [:upper:] [:xdigit:]


    > > Some of those patterns appear to be straight forward but I
    > > don't have a clue regarding the exact meaning of a whole
    > > bunch of them. To make matters worse, it appears that there
    > > is absolutely no information covering that, not only in the
    > > flex manual but also anywhere.


    > The POSIX character classes all correspond to functions of the
    > same name with 'is' prepended. So, the character class
    > [:isdigit:] has a corresponding C function isdigit(), which
    > you can look up for the definition.


    Which doesn't really answer much. The C functions are locale
    dependent; is this also true of the flex character classes? And
    if so, which locale: the one when flex is run, or the one when
    the generated program is run? (I can make some guesses, but
    this is the sort of thing which absolutely must be documented
    for any serious program.)

    --
    James Kanze (GABI Software) email:james.kanze@gmail.com
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

  4. Re: flex: doubt regarding the meaning of some character classes

    James Kanze wrote:
    > On Oct 20, 12:37 am, Martien Verbruggen
    > wrote:
    >> On 19 Oct 2008 21:02:20 GMT,
    >> Rui Maciel wrote:

    >
    >>> After browsing the flex manual I've stumbled on the patterns
    >>> section where it is said that patterns can be expressed with
    >>> the help of character class expressions. The patterns are as
    >>> follows:

    >
    >>> [:alnum:] [:alpha:] [:blank:]
    >>> [:cntrl:] [:digit:] [:graph:]
    >>> [:lower:] [rint:] [unct:]
    >>> [:space:] [:upper:] [:xdigit:]

    >
    >>> Some of those patterns appear to be straight forward but I
    >>> don't have a clue regarding the exact meaning of a whole
    >>> bunch of them. To make matters worse, it appears that there
    >>> is absolutely no information covering that, not only in the
    >>> flex manual but also anywhere.

    >
    >> The POSIX character classes all correspond to functions of the
    >> same name with 'is' prepended. So, the character class
    >> [:isdigit:] has a corresponding C function isdigit(), which
    >> you can look up for the definition.

    >
    > Which doesn't really answer much. The C functions are locale
    > dependent; is this also true of the flex character classes? And
    > if so, which locale: the one when flex is run, or the one when
    > the generated program is run? (I can make some guesses, but
    > this is the sort of thing which absolutely must be documented
    > for any serious program.)
    >
    > --
    > James Kanze (GABI Software) email:james.kanze@gmail.com
    > Conseils en informatique orientée objet/
    > Beratung in objektorientierter Datenverarbeitung
    > 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34


    The flex info manual (that accompanies the distribution) is, I believe,
    completely clear about it (the flex character classes are locale
    dependent and depend on the locale in which flex is run) - see the
    chapter entitled "Patterns".

    Robert

  5. Re: flex: doubt regarding the meaning of some character classes

    Robert Harris wrote:
    > The flex info manual (that accompanies the distribution) is, I believe,
    > completely clear about it (the flex character classes are locale
    > dependent and depend on the locale in which flex is run) - see the
    > chapter entitled "Patterns".


    It seems to say that, but when I look at the generated code, it has only
    a table for POSIX codes.

    --
    Thomas E. Dickey
    http://invisible-island.net
    ftp://invisible-island.net

  6. Re: flex: doubt regarding the meaning of some character classes

    On Mon, 20 Oct 2008 09:37:39 +1100, Martien Verbruggen wrote:

    > The POSIX character classes all correspond to functions of the same name
    > with 'is' prepended. So, the character class [:isdigit:] has a
    > corresponding C function isdigit(), which you can look up for the
    > definition.


    Thanks, Martien! Knowing that those character classes were defined in the
    POSIX standards did the trick. After a quick search I even stumbled on
    the following wikipedia article:

    http://en.wikipedia.org/wiki/Regular...racter_classes

    It has a nifty table that lists both the expanded regex expression and
    the general description of that pattern.


    Once again thanks, Martien. Kudos!
    Rui Maciel

  7. Re: flex: doubt regarding the meaning of some character classes

    On Oct 20, 12:42 pm, Robert Harris
    wrote:
    > James Kanze wrote:
    > > On Oct 20, 12:37 am, Martien Verbruggen
    > > wrote:
    > >> On 19 Oct 2008 21:02:20 GMT,
    > >> Rui Maciel wrote:


    > >>> After browsing the flex manual I've stumbled on the
    > >>> patterns section where it is said that patterns can be
    > >>> expressed with the help of character class expressions.
    > >>> The patterns are as follows:


    > >>> [:alnum:] [:alpha:] [:blank:]
    > >>> [:cntrl:] [:digit:] [:graph:]
    > >>> [:lower:] [rint:] [unct:]
    > >>> [:space:] [:upper:] [:xdigit:]


    > >>> Some of those patterns appear to be straight forward but I
    > >>> don't have a clue regarding the exact meaning of a whole
    > >>> bunch of them. To make matters worse, it appears that
    > >>> there is absolutely no information covering that, not only
    > >>> in the flex manual but also anywhere.


    > >> The POSIX character classes all correspond to functions of
    > >> the same name with 'is' prepended. So, the character class
    > >> [:isdigit:] has a corresponding C function isdigit(), which
    > >> you can look up for the definition.


    > > Which doesn't really answer much. The C functions are
    > > locale dependent; is this also true of the flex character
    > > classes? And if so, which locale: the one when flex is run,
    > > or the one when the generated program is run? (I can make
    > > some guesses, but this is the sort of thing which absolutely
    > > must be documented for any serious program.)


    > The flex info manual (that accompanies the distribution) is, I
    > believe, completely clear about it (the flex character classes
    > are locale dependent and depend on the locale in which flex is
    > run) - see the chapter entitled "Patterns".


    Yes. That makes sense, but it does mean that if you deliver
    flex source code, you need to warn your users about this,
    possibly specifying that they must use the correct locale (or
    setting it in your makefile).

    On the other hand, the isxxx functions aren't really of much use
    today, now that everything is UTF-8. Not sure how flex handles
    this issue, however.

    --
    James Kanze (GABI Software) email:james.kanze@gmail.com
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

  8. Re: flex: doubt regarding the meaning of some character classes

    On Oct 20, 4:56 pm, dic...@invisible-island.net (Thomas E. Dickey)
    wrote:
    > Robert Harris wrote:
    > > The flex info manual (that accompanies the distribution) is,
    > > I believe, completely clear about it (the flex character
    > > classes are locale dependent and depend on the locale in
    > > which flex is run) - see the chapter entitled "Patterns".


    > It seems to say that, but when I look at the generated code,
    > it has only a table for POSIX codes.


    So you ran flex in the POSIX locale. The manual is actually
    very precise about this, presenting it as a warning.

    --
    James Kanze (GABI Software) email:james.kanze@gmail.com
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

  9. Re: flex: doubt regarding the meaning of some character classes

    James Kanze wrote:
    > On Oct 20, 4:56 pm, dic...@invisible-island.net (Thomas E. Dickey)
    > wrote:
    >> Robert Harris wrote:
    >> > The flex info manual (that accompanies the distribution) is,
    >> > I believe, completely clear about it (the flex character
    >> > classes are locale dependent and depend on the locale in
    >> > which flex is run) - see the chapter entitled "Patterns".

    >
    >> It seems to say that, but when I look at the generated code,
    >> it has only a table for POSIX codes.

    >
    > So you ran flex in the POSIX locale. The manual is actually
    > very precise about this, presenting it as a warning.


    Perhaps you can quote the relevant paragraph, so we can agree on what it says.

    --
    Thomas E. Dickey
    http://invisible-island.net
    ftp://invisible-island.net

  10. Re: flex: doubt regarding the meaning of some character classes

    James Kanze wrote:
    > On Oct 20, 4:56 pm, dic...@invisible-island.net (Thomas E. Dickey)
    > wrote:
    >> Robert Harris wrote:
    >> > The flex info manual (that accompanies the distribution) is,
    >> > I believe, completely clear about it (the flex character
    >> > classes are locale dependent and depend on the locale in
    >> > which flex is run) - see the chapter entitled "Patterns".

    >
    >> It seems to say that, but when I look at the generated code,
    >> it has only a table for POSIX codes.

    >
    > So you ran flex in the POSIX locale. The manual is actually
    > very precise about this, presenting it as a warning.


    hmm - "info", from "new" flex (new bugs, no improvements ;-).

    I hadn't looked closely at the documentation, see that it does put
    that in the "info" file. Perhaps someone will improve that aspect,
    sometime.

    --
    Thomas E. Dickey
    http://invisible-island.net
    ftp://invisible-island.net

  11. Re: flex: doubt regarding the meaning of some character classes

    On Oct 23, 5:29 pm, dic...@invisible-island.net (Thomas E. Dickey)
    wrote:
    > James Kanze wrote:
    > > On Oct 20, 4:56 pm, dic...@invisible-island.net (Thomas E. Dickey)
    > > wrote:
    > >> Robert Harris wrote:
    > >> > The flex info manual (that accompanies the distribution) is,
    > >> > I believe, completely clear about it (the flex character
    > >> > classes are locale dependent and depend on the locale in
    > >> > which flex is run) - see the chapter entitled "Patterns".


    > >> It seems to say that, but when I look at the generated code,
    > >> it has only a table for POSIX codes.


    > > So you ran flex in the POSIX locale. The manual is actually
    > > very precise about this, presenting it as a warning.


    > Perhaps you can quote the relevant paragraph, so we can agree
    > on what it says.


    A word of caution. Character classes are expanded
    immediately when seen in the flex input. This means the
    character classes are sensitive to the locale in which
    flex is executed, and the resulting scanner will not be
    sensitive to the runtime locale. This may or may not be
    desirable.

    Exactly where Robert said I'd find it, in the section on
    Patterns. Immediately after the presentation of [:...:].

    --
    James Kanze (GABI Software) email:james.kanze@gmail.com
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

  12. Re: flex: doubt regarding the meaning of some character classes

    On Oct 24, 4:21*am, James Kanze wrote:
    > On Oct 23, 5:29 pm, dic...@invisible-island.net (Thomas E. Dickey)

    ....
    > Exactly where Robert said I'd find it, in the section on
    > Patterns. *Immediately after the presentation of [:...:].


    However - back to my comment (documentation is one thing, reality
    another).

    The note in (flex "2.5.35" for the sake of discussion) does not
    correspond to
    what I'm seeing. A simple lex with just this pattern

    WORD [[:alpha:]]([[:alnum:]])*

    produces the same file whether my current locale is "C" or "en_US" or
    "fr_FR".
    The relevant information for character classes appears to be in a
    table
    beginning

    static yyconst flex_int32_t yy_ec[256] =

    (The same observation applies to "2.5.4a").

    --
    Thomas E. Dickey
    http://invisible-island.net
    ftp://invisible-island.net

  13. Re: flex: doubt regarding the meaning of some character classes

    On Oct 24, 11:13*pm, dickey wrote:
    > On Oct 24, 4:21*am, James Kanze wrote:


    > > On Oct 23, 5:29 pm, dic...@invisible-island.net (Thomas E.
    > > Dickey)

    > ...
    > > Exactly where Robert said I'd find it, in the section on
    > > Patterns. *Immediately after the presentation of [:...:].


    > However - back to my comment (documentation is one thing,
    > reality another).


    > The note in (flex "2.5.35" for the sake of discussion) does
    > not correspond to what I'm seeing. *A simple lex with just
    > this pattern


    > WORD * * * * * *[[:alpha:]]([[:alnum:]])*


    > produces the same file whether my current locale is "C" or
    > "en_US" or "fr_FR".


    I'd rank that as a bug then.

    --
    James Kanze (GABI Software) email:james.kanze@gmail.com
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

+ Reply to Thread