[antlr-interest] Fwd: Semantic predicate losing token/char position on error

Jim Idle jimi at temporal-wave.com
Tue Jun 8 09:41:11 PDT 2010


You need to change your var rule I think. Try this:

var_id
	: ID (DOT^ ID)*;

This is properly left factored and will also produce a tree that is much easier to resolve in DOT notation.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Karim Chichakly
> Sent: Tuesday, June 08, 2010 8:41 AM
> To: antlr-interest at antlr.org interest
> Subject: Re: [antlr-interest] Fwd: Semantic predicate losing token/char
> position on error
> 
> Hi Jim,
> 
> The semantic predicate was a red herring.  First of all, no viable alt
> exception is reliably giving me a token, which makes sense to me.
> 
> Secondly, changing the grammar is what leads to my problem and I think
> I
> know why, though I don't know how to get around it.  If my grammar
> appears
> as follows:
> 
> var        :    var_id
>         |    var_id LEFT_PAREN args RIGHT_PAREN        -> ^(CALL var_id
> args)
>         ;
> 
> and I enter "a ab", the token I get from no viable alt exception is, in
> fact, "ab", as I expect.  If I rewrite the grammar as follows:
> 
> var        :    var_id
>         (    LEFT_PAREN args RIGHT_PAREN        -> ^(CALL var_id args)
>         |    -> var_id
>         )
>         ;
> 
> and I enter "a ab", the token I get from no viable alt exception is "".
> Indeed, token->start == token->stop == parser->input_start.  It is not
> as
> surprising to me that token->start == token->stop as that they equal
> parser->input_start.  In thinking about this, it seems the parser get
> the
> next token, "ab", which successfully matches var_id.  It then has to
> get
> another token (consuming "ab") to distinguish between the cases
> presented
> before it.  We are at the end of input, so, of course, the next token
> it
> gets is "", but I would have expected there to be a difference between
> token->start and parser->input_start of 4.
> 
> Tracing through the generated code, I see a different picture.  I
> thought
> the problem was that it consumed the token.  The problem is really that
> it
> never consumed any tokens at all.  That is why token->start ==
> parser->input_start.  My full definition of var_id is:
> 
> var_id        :    ID
>         |    ID (DOT ID)+    -> ^(DOT ID+)
>         |    DOT^ ID
>         ;
> 
> The code generated for this switches on LA(1) to test for ID or DOT.
> Within
> case ID, it then switches on LA(2) to decide between the first two
> alternatives.  Here it fails because LA(2) is also ID and it throws a
> no
> viable alt exception.  Unfortunately, since it made this decision based
> on
> LA(1) and LA(2), the actual token that caused the problem has not been
> identified; instead everything still points to the start of the input.
> Is
> there any way to recover this information?
> 
> By the way, for the [first] grammar that properly returns the erroneous
> token, cdfa18.predict is called instead, which does consume tokens.
> The
> first grammar, however, is unable to disambiguate the cases I have to
> parse
> ("X" as a 0-argument function call vs. "X" as a variable).
> 
> Thanks again,
> 
> Karim
> 
> 
> ---------- Forwarded message ----------
> From: Karim Chichakly <karimc17 at gmail.com>
> Date: Mon, Jun 7, 2010 at 2:46 PM
> Subject: Re: [antlr-interest] Fwd: Semantic predicate losing token/char
> position on error
> To: "antlr-interest at antlr.org interest" <antlr-interest at antlr.org>
> 
> 
> Hi Jim,
> 
> Thank you!  I did not realize you could write a rule like that in
> ANTLR.
> 
> Re: No viable alt exception:  I can see that the parser has no idea
> about
> what kind of token it is, but didn't the lexer pull a token off?  [If
> not,
> what is the parser trying to match?]  Where is that token?  I am
> guessing
> that this will be moot after I change the grammar as you suggest since
> I was
> getting that token (with the same error) before I put the leading
> predicate
> in.
> 
> Thanks,
> 
> Karim
> 
> 
> ---------- Forwarded message ----------
> From: Jim Idle <jimi at temporal-wave.com>
>  Date: Mon, Jun 7, 2010 at 2:12 PM
> Subject: Re: [antlr-interest] Fwd: Semantic predicate losing token/char
> position on error
> To: "antlr-interest at antlr.org interest" <antlr-interest at antlr.org>
> 
> 
> With no viable alt, there is no token to inspect as there was no token
> missing etc. You can use the bitmap of expected tokens to say what
> tokens
> could be there at that point. Hence there is no token in the exception
> as
> there is no specific token that is in error. At least off the top of my
> head
> that is the case.
> 
> You are approaching the problem from the wrong end:
> 
> varorFunc
>  : i=IDENT
>        (
>           LPAREN fa=funcArgs? RPAREN
>              { you could issue an error here if $i is not a function or
> wait
> until the tree walk }
>            -> ^(FUNCTION $i $fa?)
> 
>         | -> {isFunction($i)}? -> ^(FUNCTION $i)
>           -> $i
>        )
>   ;
> 
> You can get an IDENT with or without function parameters and the syntax
> (which is what your parser is concerned with) is always valid. Later
> you can
> verify if the names that were used were valid functions and issue a
> much
> nicer message than the parser could generate alone.
> 
> Jim
> 
> 
> 
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Karim Chichakly
> > Sent: Monday, June 07, 2010 10:46 AM
> > To: antlr-interest at antlr.org interest
> > Subject: [antlr-interest] Fwd: Semantic predicate losing token/char
> > position on error
> >
> > Hi Jim,
> >
> > Thank you.  I am sorry, but I completely missed that on the support
> > page.
> >
> > I understand your point (and thanks for the tip about
> pANTLR3_STRING),
> > but in your example, what is funcCall?  In my full grammar, I also
> have
> > a branch that looks for var_id(args), so perhaps funcCall : (args)?
> > However, the problem I have is that the grammar I am parsing allows
> an
> > identifier by itself (i.e., no distiguishing syntactical features,
> such
> > as parens) to represent either a variable or a zero-argument function
> > call.  All function names are reserved, so I can distinguish zero-
> > argument function calls from variables via a symbol table lookup.
> >
> > In the spirit of what you are saying, I think would have to pass the
> > var_ids through as var_ids and then do the lookup in a follow-on pass
> > that modifies the AST as needed.  Is this really the best way, i.e.,
> to
> > add another pass?
> >
> > I enclose my nascent error handler.  As you can see, I am trying to
> > supply uniform behavior rather than do different things based on the
> > specific error (all I want is a clear indication of what went wrong
> and
> > the position where it went wrong).  Perhaps this is folly.  The error
> > in this case was ANTLR3_NO_VIABLE_ALT_EXCEPTION.
> >
> > Thanks again,
> >
> > Karim
> >
> >
> > ---------- Forwarded message ----------
> > From: Jim Idle <jimi at temporal-wave.com>
> > Date: Mon, Jun 7, 2010 at 1:02 PM
> > Subject: Re: [antlr-interest] Semantic predicate losing token/char
> > position on error
> > To: "antlr-interest at antlr.org interest" <antlr-interest at antlr.org>
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > > bounces at antlr.org] On Behalf Of Karim Chichakly
> > > Sent: Monday, June 07, 2010 8:44 AM
> > > To: antlr-interest at antlr.org interest
> > > Subject: [antlr-interest] Semantic predicate losing token/char
> > > position on error
> > >
> > > Hi,
> > >
> > > Thank you again for your previous help.  I now know about
> > > antlr.markmail.org(perhaps a link from www.antlr.org would help
> > > others)
> >
> > You mean like the one on the support page with a box that you can
> type
> > your search terms in and a logo saying "Mark mail"? ;-)
> >
> >
> > > If, however, I add a semantic predicate to that grammar (enclosed)
> to
> > > distinguish between X as a function call and X as a variable (which
> > is
> > > described starting on page 297 of the Definitive ANTLR Reference),
> I
> > > no longer get a character position.  All four of the variables
> > > involved in the position calculation are set to 1, and the start
> and
> > > stop then become zero.
> > > These values are, by the way, a bit peculiar as these fields
> usually
> > > hold pointers into the text.  I also note that token->input is now
> > > NULL.
> >
> > Well, though this might be shown as an example in the book it isn't
> > really the way to do things. You are trying to make a semantic
> > distinction via syntax rules and that is always going to give you a
> > headache. You should parse as:
> >
> > var_id:
> >        ( funcCall -> ^(FUNCTION var_id funcCall)
> >         | -> var_id
> >      )
> >   ;
> >
> > Then check to see if the function construct really was a function
> when
> > you walk the tree in a verification pass.
> >
> > I need to see your error reporting function to help you more on the
> > display stuff. It is likely that you are trying to use elements that
> > are not valid for the type of error you are being passed. Not all
> > elements are available for all errors.
> >
> > Finally, do not use the pANTLR3_STRING stuff unless your grammar is
> > just a small single-shot parse as you will create a new string every
> > time you run that predicate! Call a function, use LT() to get the
> next
> > token, then use the pointers in the token directly. You will use no
> > memory that way!
> >
> > Jim
> >
> >
> >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> address
> 
> 
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address





More information about the antlr-interest mailing list