[antlr-interest] Lookahead predicates in the Lexer?

Gerald Rosenberg gerald at certiv.net
Tue Nov 13 13:22:40 PST 2012


Interesting approach: intercept the post-match emit rather than hand 
coding a forward scan.

Thanks,
Gerald

On 11/13/2012 11:35 AM, Sam Harwell wrote:
> Hi Gerald,
>
> I added an example in the form of a new ANTLR 4 unit test. Here is the commit:
> https://github.com/sharwell/antlr4/commit/5db5c3393d9c729d1c60340cfab7ad165a300363
>
> The sample grammar is here:
> https://github.com/sharwell/antlr4/blob/5db5c3393d9c729d1c60340cfab7ad165a300363/tool/test/org/antlr/v4/test/PositionAdjustingLexer.g4
>
> The unit test, which shows the tokens created for a representative input, is here:
> https://github.com/sharwell/antlr4/blob/5db5c3393d9c729d1c60340cfab7ad165a300363/tool/test/org/antlr/v4/test/TestLexerExec.java#L646
>
> Thanks,
> --
> Sam Harwell
> Owner, Lead Developer
> http://tunnelvisionlabs.com
>
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Gerald Rosenberg
> Sent: Tuesday, November 13, 2012 2:09 AM
> To: parrt at cs.usfca.edu
> Cc: antlr-interest
> Subject: Re: [antlr-interest] Lookahead predicates in the Lexer?
>
> Well that was what I was hoping for.  Using the v4.0b3 jar, the Lexer rule
>
> fragment COMMA        : ','    ;
> Identifier
>       : LETTER ( LETTER | DIGIT | UNDERSCORE )* { ~COMMA }? -> popMode
>       | LETTER ( LETTER | DIGIT | UNDERSCORE )*
>       ;
>
> generates, in relevant part,
>
>       public void Identifier_action(RuleContext _localctx, int actionIndex) {
>           switch (actionIndex) {
>           case 0: popMode();  break;
>           }
>       }
>       public boolean Identifier_sempred(RuleContext _localctx, int
> predIndex) {
>           switch (predIndex) {
>           case 5: return  ~COMMA ;
>           }
>           return true;
>       }
>
> Switching from the fragment rule to a token rule
>
> Comma : COMMA ;
> . . . . { ~Comma }? . . . .
>
> generates
> . . . .
>           case 5: return  ~Comma ;
>
> As if Antlr is only considering the content of the predicate to be a
> native code boolean expression.
>
>
> On 11/12/2012 5:05 PM, Terence Parr wrote:
>> That predicate should work.  If that predicate fails, then that rule will fail and the input will not be consumed for B.
>> Ter
>> On Nov 12, 2012, at 3:29 PM, Gerald Rosenberg wrote:
>>
>>> In Antlr4, is there a way to do a fixed lookahead in the lexer predicate
>>> without capturing the lookahead token(s)?  In v3, predicates could be
>>> used for this purpose.
>>>
>>> csvRule : A ( Comma B )* ;
>>>
>>> A : P Q R -> pushMode(Alphabet)
>>>
>>> mode Alphabet;
>>> B : X Y Z { ~Comma }? -> popMode
>>>      : X Y Z ;
>>>
>>> In v4 , the "~Comma" is presumed to be native code.
>>>
>>> Basically, looking for a clean, workable way to not require the use of a
>>> semicolon to explicitly terminate input that matches the csvRule, yet
>>> have a representation in the lexer that can be used as the popMode trigger.
>>>
>>> I do realize that I can write a predicate method to do a stream scan,
>>> but would prefer a non-native code solution if possible.  Also realize
>>> that, in the simplest case, csvRule could be pushed down into the
>>> Lexer.  Where A and B  are not just single terminals in the parser,
>>> other rules would have to be pushed down also, making for a bit of a mess.
>>>
>>> Thanks,
>>> Gerald
>>>
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
>




More information about the antlr-interest mailing list