[antlr-interest] ANTLR 3: Clueless about DFA prediction

Tue Sep 20 11:06:59 PDT 2005

On Sep 20, 2005, at 9:05 AM, Oliver Zeigermann wrote:

> I simply do not understand the DFA based predicion for this very
> simple lexer grammar:
>
> TEXT : (~'<')+ ;
> TAG : (~'>')+ ;
>
> I have attached the generated DFA as a GIF which reads like the TAG
> rule is predicated if *anywhere in the future input* there is an '<'
> to be found. This certainly was not my intention...

Hi Oliver, thanks for finding this!  It is a bug in my IntervalSet  
code I think.  If I change the characters to be adjacent, I get a  
different DFA!!!!  The following generates a proper DFA:

TEXT : (~'r')+ ;
TAG : (~'s')+ ;

whereas this one does not:

TEXT : (~'r')+ ;
TAG : (~'t')+ ;

Here is the prediction DFA for the first one with contiguous chars:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: t2_dec-3.pdf
Type: application/pdf
Size: 14198 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20050920/9fb757ab/t2_dec-3-0001.pdf
-------------- next part --------------

Note that ANTLR stops the analysis when it uniquely predicts which  
alt will win--it does not create a DFA to actually match the input.   
So, you'll note that states 2 and 3 do not have loops on them.  Also,  
the EOT option is like an else clause and effectively is the 's' arc  
here.

Keep up the great work, Oliver!!!!  I'm adding this as a unit test  
and tracking down the issue with the IntervalSets...

Woohoo!  Can't wait for the workshop.

Ter
--
CS Professor & Grad Director, University of San Francisco
Creator, ANTLR Parser Generator, http://www.antlr.org
Cofounder, http://www.jguru.com