[antlr-interest] greedy vs nongreedy lexer rules

Daniels, Troy (US SSA) troy.daniels at baesystems.com
Mon Apr 19 10:59:42 PDT 2010


 

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org 
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Cliff Hudson
> Sent: Sunday, April 18, 2010 8:10 PM
> To: Terence Parr
> Cc: ANTLR interest
> Subject: Re: [antlr-interest] greedy vs nongreedy lexer rules
> 
> Sorry, accidentally sent a partial message.  See below for 
> the complete message.
> 
> Ahh, that's what I was getting at.  In my system, the actions 
> would not be
> > embedded directly in the matching code as they are now.  
> Instead, all 
> > of the actions which were part of an alternative would be executed 
> > together against the global state.  Consider the following 

That fails when there are predicates in the rule that use the variables.  Consider this rule, based on something that was discusses on the list a while ago.

Fragment LETTER: 'a'..'z';

OneToFiveLetters:
@init {
 int i = 0;
}

( {i<5}? LETTER {++i;} )+ ;


The syntax probably isn't quite right, but the intent is to use a variable to accept one through five letters as a single token.  If you don't execute the actions until after you finish the rule, the predicate always passes, so aaaaaaaaaa would be lexed as a single OneToFiveLetters token.

Troy

> (assuming 
> > the C# language target) based on the example you gave above.  The 
> > implementation of the Lexer subclass would look as follows:
> >
> > public class MyLexer : Lexer
> > {
> >   int i = 0; // from @members
> >
> >   ... DFA stuff, token decls, etc...
> >   public void ID()
> >   {
> >      // Implemented as normal, except that where {i++} 
> would be, there 
> > would be
> >      // code which instead saves the action-visible state of the 
> > parser into a State block.
> >      // This visible state means only the values of those variables 
> > which are generated by the
> >      // running the the DFA, not those which might be 
> updated by the 
> > actions.  This
> >      // snapshot is then enqueued.
> >
> >         // Once ID is definitely matched, ID_alt1 itself is 
> enqueued 
> > along
> with a the queued states.
> 
> >   }
> >
> > public void ID_alt1(Queue<State> states) {
> >     State currentState = states.Dequeue();
> >     i++; // While state information was enqueued, it turns out we 
> > didn't use it here.
> >            // A language target would have translated any 
> references 
> > to action-visible
> >            // token information such as $text into references into 
> > currentState.
> >
>        currentState = states.Dequeue();
> 
> >     System.Console.WriteLine("{0}", i); // Again, no information is 
> > used, but if it were, it would be
> >
>                                                            // 
> from a reference into currentState.
> 
> > }
> >
> > Once a particular alternative is selected, the action 
> function would 
> > be
> executed with the enqueued states.
> No alternative could be executed if it was below any 
> incomplete alternative (in otherwords, fragments would not 
> execute until the alternative referencing them itself was definitively
> selected.)
> 
> Does that make more sense?
> 
> >
> > On Sun, Apr 18, 2010 at 4:53 PM, Terence Parr 
> <parrt at cs.usfca.edu> wrote:
> >
> >>
> >> On Apr 18, 2010, at 4:50 PM, Cliff Hudson wrote:
> >>
> >> > You wrote:
> >> >
> >> > Well, imagine that you are modifying "global" state as you match
> >> characters in identifier; this is something done in actions that 
> >> ANTLR can analyze. There is no way to "roll this back".
> >> >
> >> > The 'global' state you are talking about is state the DFA is 
> >> > modifying
> >> but which an action in a rule could examine?  Is the 
> amount of state 
> >> visible to actions too much to store while alternatives 
> are being evaulated?
> >>
> >> It's state you set up and modify then ref:
> >>
> >> @members {
> >> int i = 0;
> >> }
> >>
> >> ...
> >>
> >> ID : ('a'..'z' {i++;})+ {System.out.println(i);} ;
> >>
> >> ANTLR can't save i for you each char iteration.
> >>
> >> Ter
> >>
> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >> Unsubscribe:
> >> 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-addres
> >> s
> >>
> >
> >
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 


More information about the antlr-interest mailing list