[antlr-interest] Updates for release 3.2 of the C Target

Wed Sep 23 15:41:47 PDT 2009

The following should be read carefully before upgrading to release 3.2 
of the C target:

Firstly, please note that this is kind of an interim release for targets 
other than the Jav target and we will be releasing a 3.2.1 version that 
gets all the actively maintained targets up to date as soon as possible. 
However, a number of bugs have been fixed for the C target, which you 
may or may not wish to upgrade for:

1) A number of required bug fixes for tree parsing are included in this 
release, notably some situations where an empty tree coudl be produced 
by accident in a parser - I suspect that there is one more instance of 
this in 3.2 but cannot yet trace it. However, I am going to replace the 
tree building parts that are responsible for this in 3.2.1, which will 
drastically reduce memory footprints, reuse discarded objects and 
drastically improve tree building performance ;
2) A number of bug fixes over release 3.1.3, as shown by looking thorugh 
the JIRA reports closed for 3.1.4 (which will change to 3.2 shortly - we 
were going to have 3.1.4, but the changes were too much for a simple 2 
point release);
3) Changes and fixes for the handling of @after, catch and finally. 
There has always been some confusion over what these mean and especially 
in the C target, where the concepts are not easily handled for all 
combinations of rule options. I woudl very much like for some of you to 
test these changes before 3.2.1:

Note, that the definitions now only depart from Java in the fact that 
the sections do not unwind scopes until AFTER the coded sections have 
been run. This is because there is no garbage collection in C, and 
unwinding too early means it is difficult to free memory allocated in 
scopes (although, that is what the freePtr element of all scope entries 
is for really)...

    * @declarations - used to separate rule local variable definitions
      from their initializations.
    * @init - always gets executed, even before backtracking/memoization
      is looked for.
    * @after - gets executed even if memoization determines that the
      rule has already run for this token position. This is so you can
      release anything that you allocated in @init. Note however that
      @after runs ONLY if the parse was successful. If the rule hits a
      parsing error, then @after is no executed.
    * catch[] - Note that any exception type listed between [] is
      ignored. You can check the exception type by switching on
      EXECEPTION->type. These sections of code are ONLY executed if a
      parsing exception is detected while in the rule.
    * finally - this code is executed whether the rule parses
      successfully or not. This is probably where you should free
      allocations, but you should guard against uninitialized elements,
      which may not have been created on a parse error.

CAVEATS:

    * I have not found a way to access finally code when a rule returns
      as part of a backtracking parser that fails the parse. I ran out
      of time and wil look again at this for 3.2.1 - but you shoudl not
      be using backtracking parsers anyway right ;-). This means that
      you should probably not make allocations in @init until I can work
      out a schema for this - the problems is the accessibility of the
      finally code block in tree walkers.. I have to make it available
      in a wider stringtemplate scope.
    * I have not performed too much testing on this arrangement, so
      please be careful to test your parsers before using 3.2 in
      production (though it is otherwise sound I think)
    * Because you can end up executing both @after and finally code, use
      safe programming patterns (you should be anyway ;-, and assign
      NULL to anything after declaring it and immediately before freeing
      it. Also, guard against free(NULL) unless the semantics of your
      system specify that free(NULL) is allowed. Basically you don;t
      want to end up freeing things twice.

Here is an example of how to use the sections:

rule
scope { int a; }
@init
{
  // init code is here
}
@after {
// After code here - runs if parse is successful
}
: a | b
;
catch []
{
  // exception code - runs on parser error detection
}
finally {
  // finally code - always runs, see caveats though
}

If you make a noddy grammar of this with whatever options you are 
using,  and generate the C code, you can see where the code for each 
section resides.

While still not perfect because of the after stuff in backtracking 
parsers, you should now be able to clean up scopes and so on (though I 
would still recommend you add a free function to the scope instance in 
the @init() section as per the C parser example.

4) Changes to default settings controlling switch() vs if() vs tableDFA. 
The C Target now overrides the default settings for these elements to do 
the following:

avoid generating tables at all costs, even though the tables are 
reasonably contiguous and CPU cache hit rates will help, and cache hit 
fail will dog teh performance of your parser. Hence it is preferable to 
use inline if statements over tables because branch prediction kicks in. 
However, modern C compilers are very good at optimizing large switch 
statements and so the defaults are now changed to generate a switch 
directly for any alt selection between 1 and 3000 labels. You can 
increase this further with new -X options (see -X help message from the 
tool for details). In general, this improves performance a lot for 
complicated parsers with lots of keywords that can be identifiers. As 
much as 25% can be gained. [Thanks to Yitzhak Sapir of DCF Tech for 
pointing the way here.]

The defaults for other targets are unchanged, but you may wish to 
experiment with these new -X options:

$ java -jar antlr-3.2.jar -X
ANTLR Parser Generator  Version 3.2 Sep 23, 2009 14:39:33
   -Xgrtree                print the grammar AST
   -Xdfa                   print DFA as text
   -Xnoprune               test lookahead against EBNF block exit branches
   -Xnocollapse            collapse incident edges into DFA states
   -Xdbgconversion         dump lots of info during NFA conversion
   -Xmultithreaded         run the analysis in 2 threads
   -Xnomergestopstates     do not merge stop states
   -Xdfaverbose            generate DFA states in DOT with NFA configs
   -Xwatchconversion       print a message for each NFA before converting
   -XdbgST                 put tags at start/stop of all templates in output
   -Xnfastates             for nondeterminisms, list NFA states for each 
path
   -Xm m                   max number of rule invocations during 
conversion           [4]
   -Xmaxdfaedges m         max "comfortable" number of edges for single 
DFA state     [65534]
   -Xconversiontimeout t   set NFA conversion timeout (ms) for each 
decision          [1000]
*  -Xmaxinlinedfastates m  max DFA states before table used rather than 
inlining      [10]
   -Xmaxswitchcaselabels m don't generate switch() statements for dfas 
bigger  than m [300]
   -Xminswitchalts m       don't generate switch() statements for dfas 
smaller than m [3]*

Note that the API docs are not yet updated. I will do this with 3.2.1, 
which will have greatly expanded documentation.

Jim

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090923/315f42c8/attachment.html