[antlr-interest] [C Runtime] Token issues (predicates and indices) and composition tree grammars

Jim Idle jimi at temporal-wave.com
Mon Aug 25 16:12:01 PDT 2008


On Mon, 2008-08-25 at 15:53 -0600, Kurt Otte wrote:
> Jim,
> 
> I have hit a series of issues which I think may be bugs in the c
> runtime.  I have extracted simple examples from my grammar to make it
> easier to show the problem and have attached them with this email. 
> 
> Is there a high level doc explaining the architecture/design of the
> runtime files?  Is there a good way to ramp up on how it works besides
> simply looking through the code?  I was going to try to help you debug
> this, but I am finding it a bit daunting.



it is easier just to send me examples. However, the Doxygen API docs may
help you. Follow the link to the API docs from the ANTLR home page.

> 
> The examples attached should compile and run for you although you will
> probably need to tweak the makefile with you paths and/or compilers.
> The examples are a quick and dirty solution to show the problem so
> they don't have a lot of polish to work in an arbitrary environment.  

The smaller example the better b

> -------------------------------------------------------
> 
> Validating predicates in the tree grammars return incorrect token info
> 
> To reproduce this run: ex1.exe test/tree_validating_error.txt
> 
> This produces the following output:
> =====
> (BAR (CNAME (VAR (CNAME a) (CNAME b) c) (CNAME _a)) foo)
> -Imaginary-(0)  : error 6 : (0), at offset 0, near UP : syntax not
> recognized...
> 
> tree_validating_error.txt(4)  : error 3 : , at offset 11, near foo :
> cannot match to any predicted input...

I'll look but I think this is a grammar error. Probably trying to match
^(TOKEN)


> =====
> 
> Looking at the grammar in ex1walker.g, I have the following validation
> predicate:
> 
> // force this false to trigger an error
> var_cname
>  : ^(CNAME NAME {(0)}? cname?)
> ;



> 
> This forces a token to fail due to the predicate returning false
> (hard-coded in this example).  What token should the error be on?  I
> think there is some confusion here between a disambiguating semantic
> predicate and a validating semantic predicate.  It seems the
> disambiguating semantic predicate wants the error to be on the next
> token, but the validating semantic predicate want the error on the
> previous token.  However, when walking through the code, the function
> antlr3RecognitionExceptionNew seems to always grab the next token.  In
> my example, this token ends up being the imaginary UP token.  This
> leads to a confusing non-helpful error message.  Is it possible to
> look up different tokens in the error handler?  

The error handler is really just a template. I can't predict what you
need from it, especially in tree walkers. Override the handler function
with your own. The default is designed to be helpful to a grammar
programmer and as such will confuse the hell out of your users. You
should not be getting recognition errors on your AST though and unless
there is some real strange reason to have them, you should not need
syntactic predicates as you should produce an unambiguous AST.


> 
> Changing antlr3RecognitionExceptionNew to call _LT(tns,-1) for the
> ANTLR3_COMMONTREENODE case (change 1 to -1 to get previous token)
> seems to fix the problem for this particular example, but won't work
> for the general case.  This change causes the following expected
> output:
> 
> test/tree_validating_error.txt(4)  : error 6 : (0), at offset 2, near
> a : syntax not recognized...
> 
> It seems there needs to be a way to change the error handling
> depending on whether we are in a disambiguating predicate or a
> validating predicate, but I am not sure how that would be done.

I think that you should be able to produce an AST that does need
disambiguation, but there are times when you might need this for partial
tree matching I supppose. 

> 
> -------------------------------------------------------
> 
> Tokens have incorrect start and stop positions
> 
> To reproduce this, run: ex1.exe test/start_stop_error.txt
> 
> For example, if I force a syntax error in the file, I get the follow
> error
> =====
> start_stop_error.txt(3)  : error 3 : , at offset 0
>    near [Index: 0 (Start: 3497941-Stop: 3497941) ='a', type<10> Line:
> 3 LinePos:0]
> =====
> 
> Note the start and stop positions are way off.  I think there were
> some similar posts to the list about this problem, but I did not see a
> conclusion to the thread so I included an example to reproduce it.

Ah, I see what people are doing. These are absolute addresses, not
offsets! See the API docs.

> -------------------------------------------------------
> 
> Composition Tree Grammars won't compile
> 
> This looks like a simple string template issues of getting pParser
> rather than pTreeParser.
> 
> Here is the error:
> =====
> gen/ex1walker.c(314) : error C2039: 'pParser' : is not a member of
> 'ex1walker_tree_helper_Ctx_struct'
>        gen\ex1walker_tree_helper.h(84) : see declaration of
> 'ex1walker_tree_helper_Ctx_struct'
> =====
> 
> To reproduce, uncomment this line in ex1walker.g
> 
> // uncomment this line to see the problem with imported tree grammars
> import tree_helper;
> 
> This was the issue I emailed you about previously and you asked for an
> example.
> 

Right - I know what that is.

This will take a few days, I am incredibly busy at the moment.

Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080825/6ffdedd2/attachment.html 


More information about the antlr-interest mailing list