[antlr-interest] ANTLR4 synpred combination with (..)+ to greedy?
Sam Harwell
sam at tunnelvisionlabs.com
Tue Oct 30 06:24:45 PDT 2012
For now, you can work around this by moving the predicates in ID1 and ID2 to the right side instead of the left side of the [a-zA-Z] set. The predicates' text can stay the same.
-----Original Message-----
From: Sam Harwell
Sent: Tuesday, October 30, 2012 8:22 AM
To: 'cd.barth at t-online.de'; antlr-interest at antlr.org
Subject: RE: [antlr-interest] ANTLR4 synpred combination with (..)+ to greedy?
For left*most* edge predicates (evaluated before any character of the token is matched), the input index will be located where you expect it. For all other predicates in the lexer, the input index will be located one character to the left of where you are currently thinking because consume() is not called before evaluating the predicate.
This behavior may change in the future, but that certainly explains the behavior you're seeing.
-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of cd.barth at t-online.de
Sent: Tuesday, October 30, 2012 4:16 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] ANTLR4 synpred combination with (..)+ to greedy?
Using the following grammar
lexer grammar MyLexer;
WORD1 : ID1+;
WORD2 : ID2+;
fragment ID1 : {getCharPositionInLine()<2}? [a-zA-Z];
fragment ID2 : {getCharPositionInLine()>=2}? [a-zA-Z];
WS : [ \t\r\n]+ -> skip ;
and looking at lexer tokens with
for (Token token : lexer.getAllTokens()) {
int idx = token.getType();
tokenName = lexer.getTokenNames()[idx];
System.out.format(" %-12s", tokenName);
System.out.println(token);
}
for this two input lines
a cde
abcde
has printed the results
WORD1 [@-1,0:0='a',<1>,1:0]
WORD2 [@-1,2:4='cde',<2>,1:2]
WORD1 [@-1,7:9='abc',<1>,2:0]
WORD2 [@-1,10:11='de',<2>,2:3]
And now my question:
Why is letter c from the first line "a cde" part of WORD2
and in the next line "abcde" part of WORD1?
My sneaking suspicion is that in case of second line the ()+ construct from
ID1+ is to greedy and consumes one token
to much.
Claus-Dieter
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
More information about the antlr-interest
mailing list