[antlr-interest] ANTLR4 synpred combination with (..)+ too greedy?

cd.barth at t-online.de cd.barth at t-online.de
Tue Oct 30 05:18:09 PDT 2012

Using the following grammar


lexer grammar MyLexer;

WORD1                : ID1+;

WORD2                : ID2+;   

fragment ID1 : {getCharPositionInLine()<2}?   [a-zA-Z];

fragment ID2 : {getCharPositionInLine()>=2}? [a-zA-Z];

WS : [ \t\r\n]+ -> skip ;


and looking at lexer tokens with 

for (Token token : lexer.getAllTokens()) {

                int idx = token.getType();

                tokenName = lexer.getTokenNames()[idx];

                System.out.format(" %-12s", tokenName);




for this two input lines

a cde



has printed the results

WORD1       [@-1,0:0='a',<1>,1:0]

WORD2       [@-1,2:4='cde',<2>,1:2]


WORD1       [@-1,7:9='abc',<1>,2:0]

WORD2       [@-1,10:11='de',<2>,2:3]


And now my question:

Why is letter c from the first line "a cde" part of WORD2

and in the next line                      "abcde"  part of WORD1?


My sneaking suspicion is that in case of second line the ()+ construct from
ID1+ is too greedy and consumes one token too much. 




(to -> too corrected . )


More information about the antlr-interest mailing list