[antlr-interest] ANTLR4 synpred combination with (..)+ to greedy?

Tue Oct 30 02:16:20 PDT 2012

Using the following grammar

lexer grammar MyLexer;

WORD1                : ID1+;

WORD2                : ID2+;   

fragment ID1 : {getCharPositionInLine()<2}?   [a-zA-Z];

fragment ID2 : {getCharPositionInLine()>=2}? [a-zA-Z];

WS : [ \t\r\n]+ -> skip ;

and looking at lexer tokens with 

for (Token token : lexer.getAllTokens()) {

                int idx = token.getType();

                tokenName = lexer.getTokenNames()[idx];

                System.out.format(" %-12s", tokenName);

                System.out.println(token);

}

for this two input lines

a cde

abcde

has printed the results

WORD1       [@-1,0:0='a',<1>,1:0]

WORD2       [@-1,2:4='cde',<2>,1:2]

WORD1       [@-1,7:9='abc',<1>,2:0]

WORD2       [@-1,10:11='de',<2>,2:3]

And now my question:

Why is letter c from the first line "a cde" part of WORD2

and in the next line                      "abcde"  part of WORD1?

My sneaking suspicion is that in case of second line the ()+ construct from
ID1+ is to greedy and consumes one token

to much. 

Claus-Dieter