[antlr-interest] new DFA implementation for generated v3
recognizers
Terence Parr
parrt at cs.usfca.edu
Sat Mar 25 17:48:29 PST 2006
On Mar 25, 2006, at 5:09 PM, shmuel siegel wrote:
> Terence Parr wrote:
>> Hi, see
>> http://www.antlr.org/blog/antlr3/codegen.tml
>> for a write-up on new DFA implementation; should be able to do in
>> a few months.
>
> Have you looked into using a ternary tree approach for compression?
> Should handle Unicode fairly well but will probably need some
> modification to handle "not sets".
Yeah, i am not sure that is the best data structure; seems like it
would be pretty slow. It would effectively split states with lots of
edges into many many states. Thanks for the suggestion though...
The solution I have is optimal in time and very good in space for all
states with less than some threshold number of edges. For example,
if a state has just 1 edge, then there is an array of size 1 for that
state stored at transition[state]. Worst case density (i.e., most
sparse) would be states with char \u0000 and char 0..0xFFFE for the
edges. I'd waste all the space in between. This will waste 64k ints
for any state that does that, but naturally, I'll push that to a
"special" state.
Ter
More information about the antlr-interest
mailing list