[antlr-interest] ANTLR4 synpred combination with (..)+ to greedy?

Terence Parr parrt at cs.usfca.edu
Tue Oct 30 08:08:18 PDT 2012


hi all. Hmm…we should make preds work like they do in parser but might be impossible given that lexers are weird.
Ter
On Oct 30, 2012, at 6:22 AM, Sam Harwell wrote:

> For left*most* edge predicates (evaluated before any character of the token is matched), the input index will be located where you expect it. For all other predicates in the lexer, the input index will be located one character to the left of where you are currently thinking because consume() is not called before evaluating the predicate.
> 
> This behavior may change in the future, but that certainly explains the behavior you're seeing.
> 
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of cd.barth at t-online.de
> Sent: Tuesday, October 30, 2012 4:16 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] ANTLR4 synpred combination with (..)+ to greedy?
> 
> Using the following grammar
> 
> 
> 
> lexer grammar MyLexer;
> 
> WORD1                : ID1+;
> 
> WORD2                : ID2+;   
> 
> fragment ID1 : {getCharPositionInLine()<2}?   [a-zA-Z];
> 
> fragment ID2 : {getCharPositionInLine()>=2}? [a-zA-Z];
> 
> 
> WS : [ \t\r\n]+ -> skip ;
> 
> 
> 
> and looking at lexer tokens with 
> 
> for (Token token : lexer.getAllTokens()) {
> 
>                int idx = token.getType();
> 
>                tokenName = lexer.getTokenNames()[idx];
> 
>                System.out.format(" %-12s", tokenName);
> 
>                System.out.println(token);
> 
> }
> 
> 
> 
> for this two input lines
> 
> a cde
> 
> abcde
> 
> 
> 
> has printed the results
> 
> WORD1       [@-1,0:0='a',<1>,1:0]
> 
> WORD2       [@-1,2:4='cde',<2>,1:2]
> 
> 
> 
> WORD1       [@-1,7:9='abc',<1>,2:0]
> 
> WORD2       [@-1,10:11='de',<2>,2:3]
> 
> 
> 
> And now my question:
> 
> Why is letter c from the first line "a cde" part of WORD2
> 
> and in the next line                      "abcde"  part of WORD1?
> 
> 
> 
> 
> My sneaking suspicion is that in case of second line the ()+ construct from
> ID1+ is to greedy and consumes one token
> 
> to much. 
> 
> 
> 
> Claus-Dieter
> 
> 
> 
> 
> 
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address



More information about the antlr-interest mailing list