[antlr-interest] Fwd: Semantic predicate losing token/char position on error

Tue Jun 8 08:40:47 PDT 2010

Hi Jim,

The semantic predicate was a red herring.  First of all, no viable alt
exception is reliably giving me a token, which makes sense to me.

Secondly, changing the grammar is what leads to my problem and I think I
know why, though I don't know how to get around it.  If my grammar appears
as follows:

var        :    var_id
        |    var_id LEFT_PAREN args RIGHT_PAREN        -> ^(CALL var_id
args)
        ;

and I enter "a ab", the token I get from no viable alt exception is, in
fact, "ab", as I expect.  If I rewrite the grammar as follows:

var        :    var_id
        (    LEFT_PAREN args RIGHT_PAREN        -> ^(CALL var_id args)
        |    -> var_id
        )
        ;

and I enter "a ab", the token I get from no viable alt exception is "".
Indeed, token->start == token->stop == parser->input_start.  It is not as
surprising to me that token->start == token->stop as that they equal
parser->input_start.  In thinking about this, it seems the parser get the
next token, "ab", which successfully matches var_id.  It then has to get
another token (consuming "ab") to distinguish between the cases presented
before it.  We are at the end of input, so, of course, the next token it
gets is "", but I would have expected there to be a difference between
token->start and parser->input_start of 4.

Tracing through the generated code, I see a different picture.  I thought
the problem was that it consumed the token.  The problem is really that it
never consumed any tokens at all.  That is why token->start ==
parser->input_start.  My full definition of var_id is:

var_id        :    ID
        |    ID (DOT ID)+    -> ^(DOT ID+)
        |    DOT^ ID
        ;

The code generated for this switches on LA(1) to test for ID or DOT.  Within
case ID, it then switches on LA(2) to decide between the first two
alternatives.  Here it fails because LA(2) is also ID and it throws a no
viable alt exception.  Unfortunately, since it made this decision based on
LA(1) and LA(2), the actual token that caused the problem has not been
identified; instead everything still points to the start of the input.  Is
there any way to recover this information?

By the way, for the [first] grammar that properly returns the erroneous
token, cdfa18.predict is called instead, which does consume tokens.  The
first grammar, however, is unable to disambiguate the cases I have to parse
("X" as a 0-argument function call vs. "X" as a variable).

Thanks again,

Karim

---------- Forwarded message ----------
From: Karim Chichakly <karimc17 at gmail.com>
Date: Mon, Jun 7, 2010 at 2:46 PM
Subject: Re: [antlr-interest] Fwd: Semantic predicate losing token/char
position on error
To: "antlr-interest at antlr.org interest" <antlr-interest at antlr.org>

Hi Jim,

Thank you!  I did not realize you could write a rule like that in ANTLR.

Re: No viable alt exception:  I can see that the parser has no idea about
what kind of token it is, but didn't the lexer pull a token off?  [If not,
what is the parser trying to match?]  Where is that token?  I am guessing
that this will be moot after I change the grammar as you suggest since I was
getting that token (with the same error) before I put the leading predicate
in.

Thanks,

Karim

---------- Forwarded message ----------
From: Jim Idle <jimi at temporal-wave.com>
 Date: Mon, Jun 7, 2010 at 2:12 PM
Subject: Re: [antlr-interest] Fwd: Semantic predicate losing token/char
position on error
To: "antlr-interest at antlr.org interest" <antlr-interest at antlr.org>

With no viable alt, there is no token to inspect as there was no token
missing etc. You can use the bitmap of expected tokens to say what tokens
could be there at that point. Hence there is no token in the exception as
there is no specific token that is in error. At least off the top of my head
that is the case.

You are approaching the problem from the wrong end:

varorFunc
 : i=IDENT
       (
          LPAREN fa=funcArgs? RPAREN
             { you could issue an error here if $i is not a function or wait
until the tree walk }
           -> ^(FUNCTION $i $fa?)

        | -> {isFunction($i)}? -> ^(FUNCTION $i)
          -> $i
       )
  ;

You can get an IDENT with or without function parameters and the syntax
(which is what your parser is concerned with) is always valid. Later you can
verify if the names that were used were valid functions and issue a much
nicer message than the parser could generate alone.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Karim Chichakly
> Sent: Monday, June 07, 2010 10:46 AM
> To: antlr-interest at antlr.org interest
> Subject: [antlr-interest] Fwd: Semantic predicate losing token/char
> position on error
>
> Hi Jim,
>
> Thank you.  I am sorry, but I completely missed that on the support
> page.
>
> I understand your point (and thanks for the tip about pANTLR3_STRING),
> but in your example, what is funcCall?  In my full grammar, I also have
> a branch that looks for var_id(args), so perhaps funcCall : (args)?
> However, the problem I have is that the grammar I am parsing allows an
> identifier by itself (i.e., no distiguishing syntactical features, such
> as parens) to represent either a variable or a zero-argument function
> call.  All function names are reserved, so I can distinguish zero-
> argument function calls from variables via a symbol table lookup.
>
> In the spirit of what you are saying, I think would have to pass the
> var_ids through as var_ids and then do the lookup in a follow-on pass
> that modifies the AST as needed.  Is this really the best way, i.e., to
> add another pass?
>
> I enclose my nascent error handler.  As you can see, I am trying to
> supply uniform behavior rather than do different things based on the
> specific error (all I want is a clear indication of what went wrong and
> the position where it went wrong).  Perhaps this is folly.  The error
> in this case was ANTLR3_NO_VIABLE_ALT_EXCEPTION.
>
> Thanks again,
>
> Karim
>
>
> ---------- Forwarded message ----------
> From: Jim Idle <jimi at temporal-wave.com>
> Date: Mon, Jun 7, 2010 at 1:02 PM
> Subject: Re: [antlr-interest] Semantic predicate losing token/char
> position on error
> To: "antlr-interest at antlr.org interest" <antlr-interest at antlr.org>
>
>
>
>
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Karim Chichakly
> > Sent: Monday, June 07, 2010 8:44 AM
> > To: antlr-interest at antlr.org interest
> > Subject: [antlr-interest] Semantic predicate losing token/char
> > position on error
> >
> > Hi,
> >
> > Thank you again for your previous help.  I now know about
> > antlr.markmail.org(perhaps a link from www.antlr.org would help
> > others)
>
> You mean like the one on the support page with a box that you can type
> your search terms in and a logo saying "Mark mail"? ;-)
>
>
> > If, however, I add a semantic predicate to that grammar (enclosed) to
> > distinguish between X as a function call and X as a variable (which
> is
> > described starting on page 297 of the Definitive ANTLR Reference), I
> > no longer get a character position.  All four of the variables
> > involved in the position calculation are set to 1, and the start and
> > stop then become zero.
> > These values are, by the way, a bit peculiar as these fields usually
> > hold pointers into the text.  I also note that token->input is now
> > NULL.
>
> Well, though this might be shown as an example in the book it isn't
> really the way to do things. You are trying to make a semantic
> distinction via syntax rules and that is always going to give you a
> headache. You should parse as:
>
> var_id:
>        ( funcCall -> ^(FUNCTION var_id funcCall)
>         | -> var_id
>      )
>   ;
>
> Then check to see if the function construct really was a function when
> you walk the tree in a verification pass.
>
> I need to see your error reporting function to help you more on the
> display stuff. It is likely that you are trying to use elements that
> are not valid for the type of error you are being passed. Not all
> elements are available for all errors.
>
> Finally, do not use the pANTLR3_STRING stuff unless your grammar is
> just a small single-shot parse as you will create a new string every
> time you run that predicate! Call a function, use LT() to get the next
> token, then use the pointers in the token directly. You will use no
> memory that way!
>
> Jim
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address