[antlr-interest] ANTLR 3.1: Predicate order evaluation of lexer rule in the generate code for the Java target.
Terence Parr
parrt at antlr.org
Mon Aug 25 13:32:56 PDT 2008
Hi Francis,
You know I think you're absolutely right. this is a bug and a really
horrible bug! The order must be preserved; the order got changed as I
build the DFA in this case. This grammar:
lexer grammar T;
INTEGER_OR_DECIMAL
: ('0' ' ') => '0'
| ('0' '.') => '0'
| '0' '.' '0'
;
does not test the third alternative last!!! yikes... added bug.
http://www.antlr.org:8888/browse/ANTLR-316
Ter
On Aug 25, 2008, at 4:07 AM, Francis ANDRE wrote:
> In the following grammar, the syntactic predicates of the rule
> INTERGER_OR_DECIMAL are generated in an order different from what
> specified, i.e, the alternative order of predicates in the rule? Did
> I misunderstand something??
>
> fragment
> NL : ('\n'| '\r' '\n' );
> fragment
> ESC : '\'' '\'';
> STRING : '\'' ( ESC | ~'\'' )* '\'' { };
> fragment
> WS : ( ' ' | '\t' | ';' | NL);
> SPACE : (WS)+ {$channel=HIDDEN;};
> fragment
> PLUS_CHAR : '+';
> fragment
> MINUS_CHAR : '-';
> fragment
> STAR_CHAR : '*';
> fragment
> SLASH_CHAR : '/';
> fragment
> DOT_CHAR : '.';
> fragment
> COMMA_CHAR : ',';
> fragment
> PLUS : '+' WS;
> fragment
> MINUS : '-' WS;
> fragment
> STAR : '*' WS;
> fragment
> SLASH : '/' WS;
> fragment
> DOT : '.' (WS|EOF);
> fragment
> COMMA : ',' WS;
> PLUS_OR_CHAR : ('+' WS)=> PLUS { $type=PLUS; } | PLUS_CHAR {
> $type=PLUS_CHAR; };
> MINUS_OR_CHAR : ('-' WS)=> MINUS { $type=MINUS; } | MINUS_CHAR {
> $type=MINUS_CHAR; };
> STAR_OR_CHAR : ('*' WS)=> STAR { $type=STAR; } | STAR_CHAR {
> $type=STAR_CHAR; };
> SLASH_OR_CHAR : ('/' WS)=> SLASH { $type=SLASH; } | SLASH_CHAR {
> $type=SLASH_CHAR; };
> DOT_OR_CHAR : ('.' (WS|EOF))=> DOT { $type=DOT; } | DOT_CHAR {
> $type=DOT_CHAR; };
> COMMA_OR_CHAR : (',' WS)=> COMMA { $type=COMMA; } | COMMA_CHAR {
> $type=COMMA_CHAR; };
> fragment
> DIGIT : '0'..'9';
> fragment
> SIGN : ( PLUS_CHAR | MINUS_CHAR );
> fragment
> SEPARATOR : ( DOT_CHAR | COMMA_CHAR );
> fragment
> INTEGER : (SIGN)? (DIGIT)+;
> fragment
> DECIMAL : (SIGN)? (DIGIT)* SEPARATOR (DIGIT)+ ;
> INTEGER_OR_DECIMAL :
> ( (INTEGER WS) => INTEGER { $type=INTEGER; }
> | (INTEGER DOT) => INTEGER { $type=INTEGER; }
> | DECIMAL { $type=DECIMAL; }
> )
> ;
> The produced Java code is:
> int alt17=3;
> switch ( input.LA(1) ) {
> .....
> case '0' case '1': case '2': case '3': case '4': case '5':
> case
> '6': case '7': case '8': case '9':
> { int LA17_2 = input.LA(2);
> if ( (LA17_2==','||LA17_2=='.'||(LA17_2>='0' && LA17_2<='9')) ) {
> alt17=3;
> }
> else if ( (synpred7_Cobol()) ) {
> alt17=1;
> }
> else if ( (synpred8_Cobol()) ) {
> alt17=2;
> }
> else {
> if (state.backtracking>0) {state.failed=true; return ;}
> NoViableAltException nvae =
> new NoViableAltException("", 17, 2, input);
> throw nvae;
> }
> }
> which does not correspond to the order of the alternatives in the the
> rule INTEGER_OR_DECIMAL and resolves the input '1.;' to a DECIMAL
> while it should resolve as an INTEGER since a DIGIT is followed by a
> DOT
> and not a DOT_CHAR
>
> My expectation would be that the order of generated code follows the
> alternatives in the rule INTEGER_OR_DECIMAL as:
> {
> int LA17_2 = input.LA(2);
> if ( (synpred7_Cobol()) ) {
> alt17=1;
> }
> else if ( (synpred8_Cobol()) ) {
> alt17=2;
> } else
> if ( (LA17_2==','||LA17_2=='.'||(LA17_2>='0' && LA17_2<='9')) ) {
> alt17=3;
> }
> else {
> if (state.backtracking>0) {state.failed=true; return ;}
> NoViableAltException nvae =
> new NoViableAltException("", 17, 2, input);
> throw nvae;
> }
> }
> which parses the input '1.;' as an INTEGER.
>
> Francis
More information about the antlr-interest
mailing list