[antlr-interest] greedy vs nongreedy lexer rules

Sun Apr 18 14:29:02 PDT 2010

On Apr 18, 2010, at 2:02 PM, Terence Parr wrote:
>  More importantly, I'm approximating recursive lexer rules with a DFA and then will invoke the recursive method at runtime after I've distinguished the input from other rules.  What I mean is that, I really kind of need to build a DFA :)

Hmm...if we allow a stack of lexical states ("modes") then we don't need recursive lexer rules, which I rarely use anyway.  Here's how we could do nested comments:

ID : ... ;
INT : ... ; // usual stuff 
CMT_START : '/*' {pushMode(COMMENTS);} ;

mode COMMENTS:

NESTED_CMT_START : '/*' {pushMode(COMMENTS);} ;
CMT_STOP : '*/' {popMode();} ;
ANY : . ;

That's not as "cool" as this though:

ID : ... ;
INT : ... ; // usual stuff 
CMT : '/*' (CMT | .)* '*/' ;

That said, my current thoughts on impl would match CMT approximately and then rewind to call the generated CMT method and exec it as if it were a parser rule.  Less efficient.  Worse, if approx predicted two recursive methods, I'd have to try both with backtracking...hmm...so maybe we really should avoid recursive lexer rules in favor of states, which handles nongreedy situations and recursion.

Ter