[antlr-interest] greedy vs nongreedy lexer rules

Sun Apr 18 15:40:15 PDT 2010

Hi Kyle.  Thanks for the thoughts!  I'm also having more evil thoughts.  

The ANTLR lexers are really out of control in what they allow just to support edge cases.  For MOST grammars, you have no actions in lexer rules except for skip() calls in whitespace rules etc...  Some are complicated like ANTLR's action splitter. here's a few rules:

SET_DYNAMIC_SCOPE_ATTR
	:	'$' x=ID '::' y=ID WS? '=' expr=ATTR_VALUE_EXPR ';'
		{delegate.setDynamicScopeAttr($text, $x, $y, $expr);}
	;

DYNAMIC_SCOPE_ATTR
	:	'$' x=ID '::' y=ID {delegate.dynamicScopeAttr($text, $x, $y);}
	;

QUALIFIED_ATTR
	:	'$' x=ID '.' y=ID {input.LA(1)!='('}? {delegate.qualifiedAttr($text, $x, $y);}
	;

Actions are at right edges (easy to do) but they ref labels from rule refs.  I can implement this easily enough with a DFA that saves named substrings and then ref them in the action.  But, actions sort of imply I'm going to generate code for the rules. I would LOVE to do away with lexer code gen (makes new targets easier too).  With predicates and actions in middle of rules, though, we'd have to stuff those in another "support" function somewhere and then exec them AFTER we match rules in case we have an ambiguous case.  For example:

FOO : 'f' {an-action} 'oo' ;
ID : 'a'..'z'+ ;

Here, after matching 'f', we can't distinguish FOO vs ID yet we have to exec an action!  The only way is to match FOO vs ID with the DFA and then rewind and exec FOO (the winner). Ugh. That means generating a FOO() method.  Or, we could simply disallow ambig action exec, which is easy for me to detect in the NFA->DFA conversion.

What about local variables?

DUH : {int n=0;} ('a'..'z' {n++;})+ {do something with n;} ;

can't yank {int n=0;} into its own function.  I'm thinking we need to formalize locals so I can avoid genrating code that won't compile.

What about backward compatibility?  Losing recursion breaks some grammars.  Formalizing locals breaks some.  Perhaps easy answer is to simply allow v3 lexers to hook in to v4 parsers.  The imports within the v3 lexer would have change to

import org.antlr.v4.runtime.legacy.Lexer;

etc... but we could make it work.

A tough decision.  I'm aiming for really small lexers w/o code gen except for user actions and semantic predicates.  

Ter