[antlr-interest] greedy vs nongreedy lexer rules

Cliff Hudson cliff.s.hudson at gmail.com
Mon Apr 19 11:39:26 PDT 2010


Yes, predicates are a flaw in the plan.

On Mon, Apr 19, 2010 at 10:59 AM, Daniels, Troy (US SSA) <
troy.daniels at baesystems.com> wrote:

>
>
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org
> > [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Cliff Hudson
> > Sent: Sunday, April 18, 2010 8:10 PM
> > To: Terence Parr
> > Cc: ANTLR interest
> > Subject: Re: [antlr-interest] greedy vs nongreedy lexer rules
> >
> > Sorry, accidentally sent a partial message.  See below for
> > the complete message.
> >
> > Ahh, that's what I was getting at.  In my system, the actions
> > would not be
> > > embedded directly in the matching code as they are now.
> > Instead, all
> > > of the actions which were part of an alternative would be executed
> > > together against the global state.  Consider the following
>
> That fails when there are predicates in the rule that use the variables.
>  Consider this rule, based on something that was discusses on the list a
> while ago.
>
> Fragment LETTER: 'a'..'z';
>
> OneToFiveLetters:
> @init {
>  int i = 0;
> }
>
> ( {i<5}? LETTER {++i;} )+ ;
>
>
> The syntax probably isn't quite right, but the intent is to use a variable
> to accept one through five letters as a single token.  If you don't execute
> the actions until after you finish the rule, the predicate always passes, so
> aaaaaaaaaa would be lexed as a single OneToFiveLetters token.
>
> Troy
>
> > (assuming
> > > the C# language target) based on the example you gave above.  The
> > > implementation of the Lexer subclass would look as follows:
> > >
> > > public class MyLexer : Lexer
> > > {
> > >   int i = 0; // from @members
> > >
> > >   ... DFA stuff, token decls, etc...
> > >   public void ID()
> > >   {
> > >      // Implemented as normal, except that where {i++}
> > would be, there
> > > would be
> > >      // code which instead saves the action-visible state of the
> > > parser into a State block.
> > >      // This visible state means only the values of those variables
> > > which are generated by the
> > >      // running the the DFA, not those which might be
> > updated by the
> > > actions.  This
> > >      // snapshot is then enqueued.
> > >
> > >         // Once ID is definitely matched, ID_alt1 itself is
> > enqueued
> > > along
> > with a the queued states.
> >
> > >   }
> > >
> > > public void ID_alt1(Queue<State> states) {
> > >     State currentState = states.Dequeue();
> > >     i++; // While state information was enqueued, it turns out we
> > > didn't use it here.
> > >            // A language target would have translated any
> > references
> > > to action-visible
> > >            // token information such as $text into references into
> > > currentState.
> > >
> >        currentState = states.Dequeue();
> >
> > >     System.Console.WriteLine("{0}", i); // Again, no information is
> > > used, but if it were, it would be
> > >
> >                                                            //
> > from a reference into currentState.
> >
> > > }
> > >
> > > Once a particular alternative is selected, the action
> > function would
> > > be
> > executed with the enqueued states.
> > No alternative could be executed if it was below any
> > incomplete alternative (in otherwords, fragments would not
> > execute until the alternative referencing them itself was definitively
> > selected.)
> >
> > Does that make more sense?
> >
> > >
> > > On Sun, Apr 18, 2010 at 4:53 PM, Terence Parr
> > <parrt at cs.usfca.edu> wrote:
> > >
> > >>
> > >> On Apr 18, 2010, at 4:50 PM, Cliff Hudson wrote:
> > >>
> > >> > You wrote:
> > >> >
> > >> > Well, imagine that you are modifying "global" state as you match
> > >> characters in identifier; this is something done in actions that
> > >> ANTLR can analyze. There is no way to "roll this back".
> > >> >
> > >> > The 'global' state you are talking about is state the DFA is
> > >> > modifying
> > >> but which an action in a rule could examine?  Is the
> > amount of state
> > >> visible to actions too much to store while alternatives
> > are being evaulated?
> > >>
> > >> It's state you set up and modify then ref:
> > >>
> > >> @members {
> > >> int i = 0;
> > >> }
> > >>
> > >> ...
> > >>
> > >> ID : ('a'..'z' {i++;})+ {System.out.println(i);} ;
> > >>
> > >> ANTLR can't save i for you each char iteration.
> > >>
> > >> Ter
> > >>
> > >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > >> Unsubscribe:
> > >>
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-addres
> > >> s
> > >>
> > >
> > >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >
>


More information about the antlr-interest mailing list