[antlr-interest] [C Runtime] Token issues (predicates and indices) and composition tree grammars
Kurt Otte
kurtotte at gmail.com
Tue Aug 26 14:45:58 PDT 2008
Jim,
On Mon, Aug 25, 2008 at 5:12 PM, Jim Idle <jimi at temporal-wave.com> wrote:
> On Mon, 2008-08-25 at 15:53 -0600, Kurt Otte wrote:
>
> -------------------------------------------------------
>
> *Validating predicates in the tree grammars return incorrect token info*
>
> To reproduce this run: ex1.exe test/tree_validating_error.txt
>
> This produces the following output:
> =====
> (BAR (CNAME (VAR (CNAME a) (CNAME b) c) (CNAME _a)) foo)
> -Imaginary-(0) : error 6 : (0), at offset 0, near UP : syntax not
> recognized...
>
> tree_validating_error.txt(4) : error 3 : , at offset 11, near foo : cannot
> match to any predicted input...
>
> I'll look but I think this is a grammar error. Probably trying to match
> ^(TOKEN)
>
Thank you. It would appear that when the validating predicate fails it
doesn't create a failed predicate exception with the right token. Rather it
seems to fail down a different path and ends up with a mismatched token
exception with the wrong token.
>
> =====
>
> Looking at the grammar in ex1walker.g, I have the following validation
> predicate:
>
> // force this false to trigger an error
> var_cname
> : ^(CNAME NAME {(0)}? cname?)
> ;
>
>
>
> This forces a token to fail due to the predicate returning false
> (hard-coded in this example). What token should the error be on? I think
> there is some confusion here between a disambiguating semantic predicate and
> a validating semantic predicate. It seems the disambiguating semantic
> predicate wants the error to be on the next token, but the validating
> semantic predicate want the error on the previous token. However, when
> walking through the code, the function antlr3RecognitionExceptionNew seems
> to always grab the next token. In my example, this token ends up being the
> imaginary UP token. This leads to a confusing non-helpful error message.
> Is it possible to look up different tokens in the error handler?
>
> The error handler is really just a template. I can't predict what you need
> from it, especially in tree walkers. Override the handler function with your
> own. The default is designed to be helpful to a grammar programmer and as
> such will confuse the hell out of your users. You should not be getting
> recognition errors on your AST though and unless there is some real strange
> reason to have them, you should not need syntactic predicates as you should
> produce an unambiguous AST.
>
I have overridden the error handler, but how do I get a different token
after I am inside the error handler. For example, if I detect that the
token is an UP, I would prefer to back up to closest real token to get
context. If it is a DOWN, I would prefer to move forward to the next real
token; however, at this point I would just settle for any real token so I
can get context of about which line the problem is happening. I have
experimented with _LT(yada,-1) and _LA(yada,-1); however, they don't appear
to be working when called from the error handler. What is the recommended
way to get a different token from the stream in a error handler?
>
>
>
> Changing antlr3RecognitionExceptionNew to call _LT(tns,-1) for the
> ANTLR3_COMMONTREENODE case (change 1 to -1 to get previous token) seems to
> fix the problem for this particular example, but won't work for the general
> case. This change causes the following expected output:
>
> test/tree_validating_error.txt(4) : error 6 : (0), at offset 2, near a :
> syntax not recognized...
>
> It seems there needs to be a way to change the error handling depending on
> whether we are in a disambiguating predicate or a validating predicate, but
> I am not sure how that would be done.
>
> I think that you should be able to produce an AST that does need
> disambiguation, but there are times when you might need this for partial
> tree matching I supppose.
>
I use this to verify symbols have been previously declared so I need to
write a bit of code to check internal data structures to make sure the
statement is valid with a previously delcared symbol.
Thanks,
Kurt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080826/5d1fbce7/attachment.html
More information about the antlr-interest
mailing list