[antlr-interest] greedy vs nongreedy lexer rules

Sun Apr 18 16:04:41 PDT 2010

With respect to local variables and actions in ambiguous sets of rule, it
seems to me that the entire rule alternative is the scope for all actions
which appear in it, so having an action which declares a variable and then
another action later in the alternative which executes some code is really
all one method.  What would need to be dealt with is that the language
target generator would need to be able to take the state pulled from the DFA
and insert that information into the alternative's action sequence so that
each action had access to the logical state at the time it executes.

For instance, in the rule:

FOO: { int n=4; } 'a'* { n += $text.Length; } 'bcd' { System.WriteLine("{0}:
{1}", n, $text); } ;

the alternative's action function would look like:

foo_alt1(State[] states)
{
    int n=4;
    n += states[0].Text.Length;
    System.WriteLine("{0}: {1}", n, states[1].Text);
}

The State[] is an output from the DFA.  Ambiguity then doesn't have any
effect on your ability to execute actions, but language targets would need
to be rewritten.

On Sun, Apr 18, 2010 at 3:40 PM, Terence Parr <parrt at cs.usfca.edu> wrote:

> Hi Kyle.  Thanks for the thoughts!  I'm also having more evil thoughts.
>
> The ANTLR lexers are really out of control in what they allow just to
> support edge cases.  For MOST grammars, you have no actions in lexer rules
> except for skip() calls in whitespace rules etc...  Some are complicated
> like ANTLR's action splitter. here's a few rules:
>
> SET_DYNAMIC_SCOPE_ATTR
>        :       '$' x=ID '::' y=ID WS? '=' expr=ATTR_VALUE_EXPR ';'
>                {delegate.setDynamicScopeAttr($text, $x, $y, $expr);}
>        ;
>
> DYNAMIC_SCOPE_ATTR
>        :       '$' x=ID '::' y=ID {delegate.dynamicScopeAttr($text, $x,
> $y);}
>        ;
>
> QUALIFIED_ATTR
>        :       '$' x=ID '.' y=ID {input.LA(1)!='('}?
> {delegate.qualifiedAttr($text, $x, $y);}
>        ;
>
> Actions are at right edges (easy to do) but they ref labels from rule refs.
>  I can implement this easily enough with a DFA that saves named substrings
> and then ref them in the action.  But, actions sort of imply I'm going to
> generate code for the rules. I would LOVE to do away with lexer code gen
> (makes new targets easier too).  With predicates and actions in middle of
> rules, though, we'd have to stuff those in another "support" function
> somewhere and then exec them AFTER we match rules in case we have an
> ambiguous case.  For example:
>
> FOO : 'f' {an-action} 'oo' ;
> ID : 'a'..'z'+ ;
>
> Here, after matching 'f', we can't distinguish FOO vs ID yet we have to
> exec an action!  The only way is to match FOO vs ID with the DFA and then
> rewind and exec FOO (the winner). Ugh. That means generating a FOO() method.
>  Or, we could simply disallow ambig action exec, which is easy for me to
> detect in the NFA->DFA conversion.
>
> What about local variables?
>
> DUH : {int n=0;} ('a'..'z' {n++;})+ {do something with n;} ;
>
> can't yank {int n=0;} into its own function.  I'm thinking we need to
> formalize locals so I can avoid genrating code that won't compile.
>
> What about backward compatibility?  Losing recursion breaks some grammars.
>  Formalizing locals breaks some.  Perhaps easy answer is to simply allow v3
> lexers to hook in to v4 parsers.  The imports within the v3 lexer would have
> change to
>
> import org.antlr.v4.runtime.legacy.Lexer;
>
> etc... but we could make it work.
>
> A tough decision.  I'm aiming for really small lexers w/o code gen except
> for user actions and semantic predicates.
>
> Ter
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>