[antlr-interest] Lookahead predicates in the Lexer?
Gerald Rosenberg
gerald at certiv.net
Tue Nov 13 10:43:37 PST 2012
IIRC, the old-style "( A B )=>" predicate could define fixed, rule-based
lookaheads. Did this to define a modal lexer
TAG_OPEN :
( ( '<' LETTER ) => '<' { lexMode = TAG; printState("-TOpen
"); }
| ( '<' ~LETTER ) => '<' { lexMode = TEXT; $type=PCDATA;
printState("BOpen "); }
) ;
Was looking for/hoping that you had collapsed that functionality into
the v4 predicates. Having the generated lexer do the scan ahead is
always, or almost always, going to be more efficient than a hand written
scan -- at least Antlr can reuse the results of the scan it performs.
Any thought of adding this capability as an enhancement?
BTW, the new ()*? operator is nice -- explicit and succinct.
On 11/13/2012 9:33 AM, Terence Parr wrote:
> predicates have always been native code, though, right?
> Ter
> On Nov 13, 2012, at 12:09 AM, Gerald Rosenberg wrote:
>
>> Well that was what I was hoping for. Using the v4.0b3 jar, the Lexer rule
>>
>> fragment COMMA : ',' ;
>> Identifier
>> : LETTER ( LETTER | DIGIT | UNDERSCORE )* { ~COMMA }? -> popMode
>> | LETTER ( LETTER | DIGIT | UNDERSCORE )*
>> ;
>>
>> generates, in relevant part,
>>
>> public void Identifier_action(RuleContext _localctx, int actionIndex) {
>> switch (actionIndex) {
>> case 0: popMode(); break;
>> }
>> }
>> public boolean Identifier_sempred(RuleContext _localctx, int predIndex) {
>> switch (predIndex) {
>> case 5: return ~COMMA ;
>> }
>> return true;
>> }
>>
>> Switching from the fragment rule to a token rule
>>
>> Comma : COMMA ;
>> . . . . { ~Comma }? . . . .
>>
>> generates
>> . . . .
>> case 5: return ~Comma ;
>>
>> As if Antlr is only considering the content of the predicate to be a native code boolean expression.
>>
>>
>> On 11/12/2012 5:05 PM, Terence Parr wrote:
>>> That predicate should work. If that predicate fails, then that rule will fail and the input will not be consumed for B.
>>> Ter
>>> On Nov 12, 2012, at 3:29 PM, Gerald Rosenberg wrote:
>>>
>>>> In Antlr4, is there a way to do a fixed lookahead in the lexer predicate
>>>> without capturing the lookahead token(s)? In v3, predicates could be
>>>> used for this purpose.
>>>>
>>>> csvRule : A ( Comma B )* ;
>>>>
>>>> A : P Q R -> pushMode(Alphabet)
>>>>
>>>> mode Alphabet;
>>>> B : X Y Z { ~Comma }? -> popMode
>>>> : X Y Z ;
>>>>
>>>> In v4 , the "~Comma" is presumed to be native code.
>>>>
>>>> Basically, looking for a clean, workable way to not require the use of a
>>>> semicolon to explicitly terminate input that matches the csvRule, yet
>>>> have a representation in the lexer that can be used as the popMode trigger.
>>>>
>>>> I do realize that I can write a predicate method to do a stream scan,
>>>> but would prefer a non-native code solution if possible. Also realize
>>>> that, in the simplest case, csvRule could be pushed down into the
>>>> Lexer. Where A and B are not just single terminals in the parser,
>>>> other rules would have to be pushed down also, making for a bit of a mess.
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>
More information about the antlr-interest
mailing list