[9fans] yacc question - Plan9

This is a discussion on [9fans] yacc question - Plan9 ; How do I best tell yacc to expect a Rune? For example, defining a “colon-equals” assignment operator with %right L'≔' then adding it to the grammar with expr: NUMBER | VAR { $$ = mem[$1]; } | VAR L'≔' expr ...

+ Reply to Thread
Results 1 to 10 of 10

Thread: [9fans] yacc question

  1. [9fans] yacc question

    How do I best tell yacc to expect a Rune? For example, defining a
    “colon-equals” assignment operator with
    %right L'≔'
    then adding it to the grammar with
    expr: NUMBER
    | VAR { $$ = mem[$1]; }
    | VAR L'≔' expr { $$ = mem[$1] = $3; }
    | …
    results in the yacc error message:
    fatal error:must specify type for ≔, /usr/chesky/src/hak/hoc/hoc2.y:23
    Expressing the character constant as '≔' rather than L'≔' gets rid of
    that error message, but now I’m not confident that it’s looking for
    the correct value; I find no reference to the character ≔ or to the
    numbers 2254 or 8788 in y.tab.c. (Although y.output does have it in
    the proper place.) More disturbing, if I say
    %right L'≔'

    expr: NUMBER
    | VAR { $$ = mem[$1]; }
    | VAR '≔' expr { $$ = mem[$1] = $3; }
    (keeping the L prefix on one instace), yacc is satisfied as well.

    What is the “right” way of using Rune literals in yacc?

    --Joel


  2. Re: [9fans] yacc question

    You need to treat non-ASCII UTF-8 the same way
    that you treat multiple characters. That is, you
    implement '≔' the same way you'd implement ':=' or '+=':
    in the lexer as a named symbol like NUMBER and VAR.

    Russ

  3. Re: [9fans] yacc question

    > You need to treat non-ASCII UTF-8 the same way
    > that you treat multiple characters. That is, you
    > implement '≔' the same way you'd implement ':=' or '+=':
    > in the lexer as a named symbol like NUMBER and VAR.


    Then what does yacc(1) mean when it says that yacc accepts UTF input?
    Why should the value for non-terminals start at 0xE000 if yylex can’t
    return 0x2254 and have yacc understand it?

    I’ve attached a version of hoc1.y (from Kernighan & Pike) to which
    I’ve attempted to add ‘×’ as a valid multiplication symbol. Depending
    on how I write the constants, yacc may or may not accept the grammar,
    and when it does accept it the resulting program suicides at start-up.

    --Joel


  4. charstod (Was re: [9fans] yacc question)

    As an aside, my yylex code included:
    if(c == '.' || (isascii(c) && isdigit(c))){
    Bungetc(src);
    yylval = charstod(getc, 0);
    Bungetc(src);
    return NUMBER;
    }
    where getc was a simple wrapper around Bgetc(2) for compatibility with
    charstod(2). Is that the right way to handle this, or is there a cast
    that would allow me to say
    yylval = charstod(Bgetc, src);
    more directly?

    --Joel


  5. Re: [9fans] yacc question

    > Then what does yacc(1) mean when it says that yacc accepts UTF input?
    > Why should the value for non-terminals start at 0xE000 if yylex can't
    > return 0x2254 and have yacc understand it?


    I learn something new every day! Neat.
    I've never chosen to depend on that behavior.

    I just tried your program and it worked fine for
    me once I changed yylex to return actual Unicode
    values instead of byte values. (There is a difference
    between Bgetc and Bgetrune.)

    > I've attached a version of hoc1.y (from Kernighan & Pike) to which
    > I've attempted to add '' as a valid multiplication symbol. Depending
    > on how I write the constants, yacc may or may not accept the grammar,
    > and when it does accept it the resulting program suicides at start-up.


    Oh, I did have to fix this too. But this has nothing to
    do with Unicode. Use '*' and it will still die. This one
    you should be able to figure out on your own, with help
    from acid or db.

    Russ

  6. Re: [9fans] yacc question

    > I just tried your program and it worked fine for
    > me once I changed yylex to return actual Unicode
    > values instead of byte values. (There is a difference
    > between Bgetc and Bgetrune.)


    D’oh!

    > > Depending on how I write the constants, yacc may or may not accept
    > > the grammar, and when it does accept it the resulting program
    > > suicides at start-up.

    >
    > Oh, I did have to fix this too. But this has nothing to do with
    > Unicode. Use '*' and it will still die. This one you should be able
    > to figure out on your own, with help from acid or db.


    Can I get a hint on how to use these for my program?

    cpu% hoc1
    hoc1 38691: suicide: sys: trap: fault write addr=0x10 pc=0x000068d7
    cpu% db 38691
    386 binary
    page fault
    /sys/src/libbio/binit.c:66 Binits+c0/ MOVL $1,10(BP)

    But I can’t tell how I’m using Binit(2) wrong—

    —never mind; added
    src = malloc(sizeof *src);
    and all is well. Now I need to guess what was different in the
    version of the code that worked…

    Thanks for the help,

    --Joel


  7. Re: [9fans] yacc question

    It's lex that doesn't understand UTF, though I did use
    an encoding trick to smuggle UTF through lex when I once
    wanted to.

  8. Re: [9fans] yacc question

    > I did use an encoding trick to smuggle UTF through lex when I once
    > wanted to.


    Was this just having ‘宁静’ match ‘..’ or something more clever?

    --Joel


  9. Re: [9fans] yacc question

    I don't have access to the code currently, but I believe the trick was
    to have the function that lex calls to get its next input character
    notice the start of a non-ASCII UTF sequence, read the whole rune,
    convert it to \033 followed by the 4 hex digits of the rune's value,
    and pass those bytes consecutively to lex. Then yylex() would do the
    reverse translation from escaped hex back to a rune, so yacc's parser
    would see full runes.


  10. Re: [9fans] yacc question

    cool hack, ESCAPE ESCAPE ... too much confusion.

    On 2/5/07, geoff@plan9.bell-labs.com wrote:
    > I don't have access to the code currently, but I believe the trick was
    > to have the function that lex calls to get its next input character
    > notice the start of a non-ASCII UTF sequence, read the whole rune,
    > convert it to \033 followed by the 4 hex digits of the rune's value,
    > and pass those bytes consecutively to lex. Then yylex() would do the
    > reverse translation from escaped hex back to a rune, so yacc's parser
    > would see full runes.
    >
    >


+ Reply to Thread