[antlr-interest] new DFA implementation for generated v3 recognizers

Sat Mar 25 17:48:29 PST 2006

On Mar 25, 2006, at 5:09 PM, shmuel siegel wrote:

> Terence Parr wrote:
>> Hi, see
>> http://www.antlr.org/blog/antlr3/codegen.tml
>> for a write-up on new DFA implementation; should be able to do in  
>> a few months.
>
> Have you looked into using a ternary tree approach for compression?  
> Should handle Unicode fairly well but will probably need some  
> modification to handle "not sets".

Yeah, i am not sure that is the best data structure; seems like it  
would be pretty slow.  It would effectively split states with lots of  
edges into many many states.  Thanks for the suggestion though...

The solution I have is optimal in time and very good in space for all  
states with less than some threshold number of edges.  For example,  
if a state has just 1 edge, then there is an array of size 1 for that  
state stored at transition[state].  Worst case density (i.e., most  
sparse) would be states with char \u0000 and char 0..0xFFFE for the  
edges.  I'd waste all the space in between.  This will waste 64k ints  
for any state that does that, but naturally, I'll push that to a  
"special" state.

Ter