[antlr-interest] ANTLR 3.1:Generated code for syntactic predicate in lexer rule does not follow the order of the alternatives

zosrothko zosrothko at orange.fr
Tue Aug 19 13:21:32 PDT 2008


Hi

The following snippet


fragment
NL : ('\n'| '\r' '\n' );
fragment
ESC : '\'' '\'';
STRING : '\'' ( ESC | ~'\'' )* '\'' { };
fragment
WS : ( ' ' | '\t' | ';' | NL);
SPACE : (WS)+ {$channel=HIDDEN;};
fragment
PLUS_CHAR : '+';
fragment
MINUS_CHAR : '-';
fragment
STAR_CHAR : '*';
fragment
SLASH_CHAR : '/';
fragment
DOT_CHAR : '.';
fragment
COMMA_CHAR : ',';
fragment
PLUS : '+' WS;
fragment
MINUS : '-' WS;
fragment
STAR : '*' WS;
fragment
SLASH : '/' WS;
fragment
DOT : '.' (WS|EOF);
fragment
COMMA : ',' WS;
 
PLUS_OR_CHAR : ('+' WS)=> PLUS { $type=PLUS; }
| PLUS_CHAR { $type=PLUS_CHAR; }
;
MINUS_OR_CHAR : ('-' WS)=> MINUS { $type=MINUS; }
| MINUS_CHAR { $type=MINUS_CHAR; }
;
STAR_OR_CHAR : ('*' WS)=> STAR { $type=STAR; }
| STAR_CHAR { $type=STAR_CHAR; }
;
SLASH_OR_CHAR : ('/' WS)=> SLASH { $type=SLASH; }
| SLASH_CHAR { $type=SLASH_CHAR; }
;
DOT_OR_CHAR : ('.' (WS|EOF))=> DOT { $type=DOT; }
| DOT_CHAR { $type=DOT_CHAR; }
;
COMMA_OR_CHAR : (',' WS)=> COMMA { $type=COMMA; }
| COMMA_CHAR { $type=COMMA_CHAR; }
;
fragment
DIGIT : '0'..'9';
fragment
SIGN : ( PLUS_CHAR | MINUS_CHAR );
fragment
SEPARATOR : ( DOT_CHAR | COMMA_CHAR );
fragment
INTEGER : (SIGN)? (DIGIT)+;
fragment
DECIMAL : (SIGN)? (DIGIT)* SEPARATOR (DIGIT)+ ;
 
INTEGER_OR_DECIMAL :
( (INTEGER WS) => INTEGER { $type=INTEGER; }
| (INTEGER DOT) => INTEGER { $type=INTEGER; }
| DECIMAL { $type=DECIMAL; }
)
;

produces this Java code

  int alt17=3;
  switch ( input.LA(1) ) {

  .....
  case '0':
  case '1':
  case '2':
  case '3':
  case '4':
  case '5':
  case '6':
  case '7':
  case '8':
  case '9':
  {
    int LA17_2 = input.LA(2);
 
    if ( (LA17_2==','||LA17_2=='.'||(LA17_2>='0' && LA17_2<='9')) ) {
      alt17=3;
    }
    else if ( (synpred7_Cobol()) ) {
      alt17=1;
    }
    else if ( (synpred8_Cobol()) ) {
      alt17=2;
    }
    else {
     if (state.backtracking>0) {state.failed=true; return ;}
       NoViableAltException nvae =
        new NoViableAltException("", 17, 2, input);
 
        throw nvae;
    }
  }

which does not correspond to the order of the alternatives in the the rule INTEGER_OR_DECIMAL and resolves the input  '1.;'  to a DECIMAL while it should resolve as an INTEGER since a DIGIT is followed by a DOT and not a DOT_CHAR



My expectation would be that the order of generated code follow the alternatives in the rule INTEGER_OR_DECIMAL as:
 {
    int LA17_2 = input.LA(2);
 
    if ( (synpred7_Cobol()) ) {
      alt17=1;
    }
    else if ( (synpred8_Cobol()) ) {
      alt17=2;
    } else 
    if ( (LA17_2==','||LA17_2=='.'||(LA17_2>='0' && LA17_2<='9')) ) {
      alt17=3;
    }
    else {
     if (state.backtracking>0) {state.failed=true; return ;}
       NoViableAltException nvae =
        new NoViableAltException("", 17, 2, input);
 
        throw nvae;
    }
which parses the input '1.;' as an INTEGER.

Is this a bug??

zos
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080819/1b9674f6/attachment.html 


More information about the antlr-interest mailing list