[antlr-interest] ANTLR 3.1: Predicate order evaluation of lexer rule in the generate code for the Java target.

Mon Aug 25 13:32:56 PDT 2008

Hi Francis,

You know I think you're absolutely right. this is a bug and a really  
horrible bug! The order must be preserved; the order got changed as I  
build the DFA in this case.  This grammar:

lexer grammar T;

INTEGER_OR_DECIMAL
         : ('0' ' ') => '0'
         | ('0' '.') => '0'
         | '0' '.' '0'
         ;

  does not test the third alternative last!!! yikes... added bug.

http://www.antlr.org:8888/browse/ANTLR-316

Ter

On Aug 25, 2008, at 4:07 AM, Francis ANDRE wrote:
> In the following grammar, the syntactic predicates of the rule
> INTERGER_OR_DECIMAL are generated in an order different from what
> specified, i.e, the alternative order of predicates in the rule? Did  
> I misunderstand something??
>
> fragment
> NL : ('\n'| '\r' '\n' );
> fragment
> ESC : '\'' '\'';
> STRING : '\'' ( ESC | ~'\'' )* '\'' { };
> fragment
> WS : ( ' ' | '\t' | ';' | NL);
> SPACE : (WS)+ {$channel=HIDDEN;};
> fragment
> PLUS_CHAR : '+';
> fragment
> MINUS_CHAR : '-';
> fragment
> STAR_CHAR : '*';
> fragment
> SLASH_CHAR : '/';
> fragment
> DOT_CHAR : '.';
> fragment
> COMMA_CHAR : ',';
> fragment
> PLUS : '+' WS;
> fragment
> MINUS : '-' WS;
> fragment
> STAR : '*' WS;
> fragment
> SLASH : '/' WS;
> fragment
> DOT : '.' (WS|EOF);
> fragment
> COMMA : ',' WS;
> PLUS_OR_CHAR  : ('+' WS)=> PLUS { $type=PLUS; }     | PLUS_CHAR {
> $type=PLUS_CHAR; };
> MINUS_OR_CHAR : ('-' WS)=> MINUS { $type=MINUS; }   | MINUS_CHAR {
> $type=MINUS_CHAR; };
> STAR_OR_CHAR  : ('*' WS)=> STAR { $type=STAR; }     | STAR_CHAR {
> $type=STAR_CHAR; };
> SLASH_OR_CHAR : ('/' WS)=> SLASH { $type=SLASH; }   | SLASH_CHAR {
> $type=SLASH_CHAR; };
> DOT_OR_CHAR   : ('.' (WS|EOF))=> DOT { $type=DOT; } | DOT_CHAR {
> $type=DOT_CHAR; };
> COMMA_OR_CHAR : (',' WS)=> COMMA { $type=COMMA; }   | COMMA_CHAR {
> $type=COMMA_CHAR; };
> fragment
> DIGIT : '0'..'9';
> fragment
> SIGN : ( PLUS_CHAR | MINUS_CHAR );
> fragment
> SEPARATOR : ( DOT_CHAR | COMMA_CHAR );
> fragment
> INTEGER : (SIGN)? (DIGIT)+;
> fragment
> DECIMAL : (SIGN)? (DIGIT)* SEPARATOR (DIGIT)+ ;
> INTEGER_OR_DECIMAL :
>                     ( (INTEGER WS) => INTEGER { $type=INTEGER; }
>                     | (INTEGER DOT) => INTEGER { $type=INTEGER; }
>                     | DECIMAL { $type=DECIMAL; }
>                     )
> ;
> The produced Java code is:
> int alt17=3;
>   switch ( input.LA(1) ) {
>   .....
>   case '0'  case '1':  case '2':  case '3':  case '4':  case '5':   
> case
> '6':  case '7':  case '8':  case '9':
>   {   int LA17_2 = input.LA(2);
>     if ( (LA17_2==','||LA17_2=='.'||(LA17_2>='0' && LA17_2<='9')) ) {
>       alt17=3;
>     }
>     else if ( (synpred7_Cobol()) ) {
>       alt17=1;
>     }
>     else if ( (synpred8_Cobol()) ) {
>       alt17=2;
>     }
>     else {
>      if (state.backtracking>0) {state.failed=true; return ;}
>        NoViableAltException nvae =
>         new NoViableAltException("", 17, 2, input);
>         throw nvae;
>     }
>   }
> which does not correspond to the order of the alternatives in the the
> rule INTEGER_OR_DECIMAL and resolves the input  '1.;'  to a DECIMAL
> while it should resolve as an INTEGER since a DIGIT is followed by a  
> DOT
> and not a DOT_CHAR
>
> My expectation would be that the order of generated code follows the
> alternatives in the rule INTEGER_OR_DECIMAL as:
> {
>     int LA17_2 = input.LA(2);
>     if ( (synpred7_Cobol()) ) {
>       alt17=1;
>     }
>     else if ( (synpred8_Cobol()) ) {
>       alt17=2;
>     } else
>     if ( (LA17_2==','||LA17_2=='.'||(LA17_2>='0' && LA17_2<='9')) ) {
>       alt17=3;
>     }
>     else {
>      if (state.backtracking>0) {state.failed=true; return ;}
>        NoViableAltException nvae =
>         new NoViableAltException("", 17, 2, input);
>         throw nvae;
>     }
> }
> which parses the input '1.;' as an INTEGER.
>
> Francis