[antlr-interest] Lookahead predicates in the Lexer?

Sam Harwell sam at tunnelvisionlabs.com
Tue Nov 13 11:35:30 PST 2012


Hi Gerald,

I added an example in the form of a new ANTLR 4 unit test. Here is the commit:
https://github.com/sharwell/antlr4/commit/5db5c3393d9c729d1c60340cfab7ad165a300363

The sample grammar is here:
https://github.com/sharwell/antlr4/blob/5db5c3393d9c729d1c60340cfab7ad165a300363/tool/test/org/antlr/v4/test/PositionAdjustingLexer.g4

The unit test, which shows the tokens created for a representative input, is here:
https://github.com/sharwell/antlr4/blob/5db5c3393d9c729d1c60340cfab7ad165a300363/tool/test/org/antlr/v4/test/TestLexerExec.java#L646

Thanks,
--
Sam Harwell
Owner, Lead Developer
http://tunnelvisionlabs.com

-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Gerald Rosenberg
Sent: Tuesday, November 13, 2012 2:09 AM
To: parrt at cs.usfca.edu
Cc: antlr-interest
Subject: Re: [antlr-interest] Lookahead predicates in the Lexer?

Well that was what I was hoping for.  Using the v4.0b3 jar, the Lexer rule

fragment COMMA        : ','    ;
Identifier
     : LETTER ( LETTER | DIGIT | UNDERSCORE )* { ~COMMA }? -> popMode
     | LETTER ( LETTER | DIGIT | UNDERSCORE )*
     ;

generates, in relevant part,

     public void Identifier_action(RuleContext _localctx, int actionIndex) {
         switch (actionIndex) {
         case 0: popMode();  break;
         }
     }
     public boolean Identifier_sempred(RuleContext _localctx, int 
predIndex) {
         switch (predIndex) {
         case 5: return  ~COMMA ;
         }
         return true;
     }

Switching from the fragment rule to a token rule

Comma : COMMA ;
. . . . { ~Comma }? . . . .

generates
. . . .
         case 5: return  ~Comma ;

As if Antlr is only considering the content of the predicate to be a 
native code boolean expression.


On 11/12/2012 5:05 PM, Terence Parr wrote:
> That predicate should work.  If that predicate fails, then that rule will fail and the input will not be consumed for B.
> Ter
> On Nov 12, 2012, at 3:29 PM, Gerald Rosenberg wrote:
>
>> In Antlr4, is there a way to do a fixed lookahead in the lexer predicate
>> without capturing the lookahead token(s)?  In v3, predicates could be
>> used for this purpose.
>>
>> csvRule : A ( Comma B )* ;
>>
>> A : P Q R -> pushMode(Alphabet)
>>
>> mode Alphabet;
>> B : X Y Z { ~Comma }? -> popMode
>>     : X Y Z ;
>>
>> In v4 , the "~Comma" is presumed to be native code.
>>
>> Basically, looking for a clean, workable way to not require the use of a
>> semicolon to explicitly terminate input that matches the csvRule, yet
>> have a representation in the lexer that can be used as the popMode trigger.
>>
>> I do realize that I can write a predicate method to do a stream scan,
>> but would prefer a non-native code solution if possible.  Also realize
>> that, in the simplest case, csvRule could be pushed down into the
>> Lexer.  Where A and B  are not just single terminals in the parser,
>> other rules would have to be pushed down also, making for a bit of a mess.
>>
>> Thanks,
>> Gerald
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address




More information about the antlr-interest mailing list