[antlr-interest] ANTLR4 synpred combination with (..)+ to greedy?

cd.barth at t-online.de cd.barth at t-online.de
Tue Oct 30 02:16:20 PDT 2012


Using the following grammar

 

lexer grammar MyLexer;

WORD1                : ID1+;

WORD2                : ID2+;   

fragment ID1 : {getCharPositionInLine()<2}?   [a-zA-Z];

fragment ID2 : {getCharPositionInLine()>=2}? [a-zA-Z];


WS : [ \t\r\n]+ -> skip ;

 

and looking at lexer tokens with 

for (Token token : lexer.getAllTokens()) {

                int idx = token.getType();

                tokenName = lexer.getTokenNames()[idx];

                System.out.format(" %-12s", tokenName);

                System.out.println(token);

}

 

for this two input lines

a cde

abcde

 

has printed the results

WORD1       [@-1,0:0='a',<1>,1:0]

WORD2       [@-1,2:4='cde',<2>,1:2]

 

WORD1       [@-1,7:9='abc',<1>,2:0]

WORD2       [@-1,10:11='de',<2>,2:3]

 

And now my question:

Why is letter c from the first line "a cde" part of WORD2

and in the next line                      "abcde"  part of WORD1?


 

My sneaking suspicion is that in case of second line the ()+ construct from
ID1+ is to greedy and consumes one token

to much. 

 

Claus-Dieter

 

 

 



More information about the antlr-interest mailing list