[antlr-interest] [C Runtime] Token issues (predicates and indices) and composition tree grammars

Kurt Otte kurtotte at gmail.com
Tue Aug 26 14:45:58 PDT 2008


Jim,

On Mon, Aug 25, 2008 at 5:12 PM, Jim Idle <jimi at temporal-wave.com> wrote:

>  On Mon, 2008-08-25 at 15:53 -0600, Kurt Otte wrote:
>
>  -------------------------------------------------------
>
> *Validating predicates in the tree grammars return incorrect token info*
>
> To reproduce this run: ex1.exe test/tree_validating_error.txt
>
> This produces the following output:
> =====
> (BAR (CNAME (VAR (CNAME a) (CNAME b) c) (CNAME _a)) foo)
> -Imaginary-(0)  : error 6 : (0), at offset 0, near UP : syntax not
> recognized...
>
> tree_validating_error.txt(4)  : error 3 : , at offset 11, near foo : cannot
> match to any predicted input...
>
> I'll look but I think this is a grammar error. Probably trying to match
> ^(TOKEN)
>

Thank you.   It would appear that when the validating predicate fails it
doesn't create a failed predicate exception with the right token.  Rather it
seems to fail down a different path and ends up with a mismatched token
exception with the wrong token.

>
>  =====
>
> Looking at the grammar in ex1walker.g, I have the following validation
> predicate:
>
> // force this false to trigger an error
> var_cname
>  : ^(CNAME NAME {(0)}? cname?)
> ;
>
>
>
> This forces a token to fail due to the predicate returning false
> (hard-coded in this example).  What token should the error be on?  I think
> there is some confusion here between a disambiguating semantic predicate and
> a validating semantic predicate.  It seems the disambiguating semantic
> predicate wants the error to be on the next token, but the validating
> semantic predicate want the error on the previous token.  However, when
> walking through the code, the function antlr3RecognitionExceptionNew seems
> to always grab the next token.  In my example, this token ends up being the
> imaginary UP token.  This leads to a confusing non-helpful error message.
> Is it possible to look up different tokens in the error handler?
>
> The error handler is really just a template. I can't predict what you need
> from it, especially in tree walkers. Override the handler function with your
> own. The default is designed to be helpful to a grammar programmer and as
> such will confuse the hell out of your users. You should not be getting
> recognition errors on your AST though and unless there is some real strange
> reason to have them, you should not need syntactic predicates as you should
> produce an unambiguous AST.
>

I have overridden the error handler, but how do I get a different token
after I am inside the error handler.  For example, if I detect that the
token is an UP, I would prefer to back up to closest real token to get
context.  If it is a DOWN, I would prefer to move forward to the next real
token; however, at this point I would just settle for any real token so I
can get context of about which line the problem is happening.  I have
experimented with _LT(yada,-1) and _LA(yada,-1); however, they don't appear
to be working when called from the error handler.  What is the recommended
way to get a different token from the stream in a error handler?

>
>
>
> Changing antlr3RecognitionExceptionNew to call _LT(tns,-1) for the
> ANTLR3_COMMONTREENODE case (change 1 to -1 to get previous token) seems to
> fix the problem for this particular example, but won't work for the general
> case.  This change causes the following expected output:
>
> test/tree_validating_error.txt(4)  : error 6 : (0), at offset 2, near a :
> syntax not recognized...
>
> It seems there needs to be a way to change the error handling depending on
> whether we are in a disambiguating predicate or a validating predicate, but
> I am not sure how that would be done.
>
> I think that you should be able to produce an AST that does need
> disambiguation, but there are times when you might need this for partial
> tree matching I supppose.
>

I use this to verify symbols have been previously declared so I need to
write a bit of code to check internal data structures to make sure the
statement is valid with a previously delcared symbol.

Thanks,

Kurt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080826/5d1fbce7/attachment.html 


More information about the antlr-interest mailing list