[antlr-interest] Advice with backtracking/ambiguity

Jim Idle jimi at temporal-wave.com
Wed Jun 2 15:44:01 PDT 2010


You just need to test that there were no hidden space elements between the parser tokens that cannot have them and reject that rule if there were. But it is much neater to use the predicate in the lexer and I do not think that it makes things look strange at all - you just get used to it.

But, if there are too many of these in the lexer then maybe you should be using the parser or perhaps the task is better suited to awk or a filtering lexer.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of John B. Brodie
> Sent: Wednesday, June 02, 2010 3:39 PM
> To: Ken Williams
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Advice with backtracking/ambiguity
> 
> On Wed, 2010-06-02 at 17:03 -0500, Ken Williams wrote:
> > Yeah, probably I should be using parser rules.  I was trying to keep
> things
> > "simple" by making everything a linear stream of tokens from the
> point of
> > view of the Java caller, while still having high-level constructs
> like DATE.
> >
> > Perhaps what I really want is something like this:
> 
> just be aware that when you make date a parser rule WS will be silently
> accepted between the DIGITS and SLASHes comprising the date
> non-terminal.
> 
> if your language permits this, great! otherwise you would need to keep
> it in the lexer.
> 
> i do not know what $text will be for a parser rule and do not know
> whether any skip()'d WS will (or not) be included.
> 
> (same drill for COMMENT if you have that and/or any other skip()'d or
> HIDDEN token within the lexer)
> 
> >
> > -------------------
> > options {
> >     backtrack=true;
> >     memoize=true;
> >     output=AST;
> > }
> >
> > tokens {
> >     DATE;
> > }
> >
> > cite    :    token+ EOF ;
> > token   :    date -> DATE | SLASH | DIGITS;
> > date    :    DIGITS SLASH DIGITS SLASH DIGITS ;
> >
> > SLASH   :    '/' ;
> > DIGITS  :    ('0'..'9')+ ;
> > WS      :    ( ' ' | '\t'| '\f' | '\n' | '\r' ) {skip();} ;
> > -------------------
> >
> >
> > The only thing missing now is the character-data from DATE.  Is there
> a way
> > to change that 'token' rule to something like this?
> >
> > token   :    date -> {new CommonToken(DATE, $text)} | SLASH | DIGITS;
> >
> >
> > I appreciate all the help.
> >
> >
> >
> > On 6/2/10 4:41 PM, "Jim Idle" <jimi at temporal-wave.com> wrote:
> >
> > > This isn't left factored, it is doing the lookahead that you
> require to
> > > distinguish the two. In v4 this will be different, but for now,
> this is what
> > > you will need to do.
> > >
> > > Or, don't try to do it in the lexer at all and construct parser
> rules for it.
> >
> 
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address





More information about the antlr-interest mailing list