[antlr-interest] C generator is not generating @after actions

Thu Feb 5 14:58:15 PST 2009

At 05:31 6/02/2009, Jim Idle wrote:
>You are correct that it isn't exactly semantically equivalent, 
>but I have never seen a case where people wanted to do anything 
>different. The exception clause would generally be more useful. 
>In a realt @after clause, you would have to make sure you checked 
>any references for NULL before trying to do anything anyway. So, 
>at least for the more usual case, it makes more sense to have the 
>action that you want to happen at the rule end, when it is 
>successful, in an action at the end. Then, if there is something 
>special you need to do upon failure, you want an exception clause 
>like the Java target. I think most people are reading @after and 
>just see "when this rule finishes successfully".

I think the theory is that if the @init does something that needs 
cleaning up after -- such as allocating memory or opening a file 
-- then the @after does the cleanup.  In practice for C# / Java 
targets it's not critical, as the GC will eventually get around to 
tidying things up anyway (though not useless, if the rule is 
re-entered faster than the GC can tidy up), but for C it'd be more 
useful.

Having said that, there's always another way to write the code to 
avoid that kind of dependency anyway, so in practice I've never 
needed to do it that way.

>I still plan on working through all the possible combinations 
>before 3.1.2 is released. The difficulty is not adding an @after 
>section of course, but that using a static template, can I make 
>sure that all possible code paths, given all possible rule 
>element combinations, including backtracking, @after, exceptions 
>and so on, thread their way correctly through the generated C 
>code and are semantically equivalent to the Java code. The answer 
>may well be yes, but I want consider performance and complexity, 
>as if there is a semantically equivalent way of expressing this 
>in the grammar, then it might make more sense to just instruct 
>people in the documentation.

I can't see how you could ever make the C target tolerate 
exceptions being thrown mid-rule without turning it into a partial 
C++ target :)

Besides, nobody ever reads documentation anyway ;)

>So, in the C target I have removed pretty much all the NULL 
>guards as it is better to get a violation than mask 
>grammar/coding errors. In the case of the return from a rule, the 
>return is in fact a struct, which is declared as such in the 
>calling rule. The struct in the calling rule will therefore never 
>be NULL, and memsetting it to 0 does not solve that issue, though 
>it could have a special field that says if it has been used yet 
>and so on.

Memsetting it to 0 will clear the contents of the struct, though, 
thereby ensuring that any embedded pointers etc will actually be 
NULL and will fail quickly instead of being some random address 
that happened to be on the heap (or worse, a valid address put 
into a previous instance of the same structure that's being 
reused), which will fail subtly rather than obviously and be just 
as hard to track down.

I can understand your reasoning, though; once the grammar *is* 
doing the right things then the memsets are just wasting 
time.  But during initial development and debugging they're 
invaluable to prevent subtle bugs, as you yourself basically 
admitted in the paragraph prior to the one I quoted.

Maybe grammars should have an additional option, telling ANTLR 
whether to aim for robustness (thereby including extra sanity 
checks, such as the memsets) or for performance (leaving them out, 
once the author is happy that their grammar works properly, if 
slowly).  The default should be for robustness, so that newly 
developed grammars get sanity checked.  In fact, rather than 
memsetting to 0, you could take a page from VC++ and memset to 
0xCD when in robust mode (and do nothing in performance mode), 
thereby basically guaranteeing a crash in robust mode if someone 
tries to use something without initialising it first, since NULL 
checks wouldn't work.

(A grammar compiled with -debug should probably also use robust 
mode regardless of the option, but there should still be the 
separate option for non -debug compiles, since not everyone uses 
-debug at all.)