[antlr-interest] Trouble with nondeterminism

Mon Aug 28 16:37:38 PDT 2006

I keeping with my "starting simple" plan, I've been adding bits to my simple
parser as I get the hang of ANTLR.  The many examples on the Net are also
helping a great deal.

But...  I'm having trouble trying to have my lexer recognize both int and
double number types.  I've tried a few different examples all ending in
similar ANTLR messages.  It looks like this:

Constant
    :   INT
    |   DOUBLE
    ;

INT
    :   ('0' | '1'..'9' ('0'..'9')*) ;

DOUBLE
    :   ('0'..'9')+ '.' ('0'..'9')* (Exponent)?
    |   '.' ('0'..'9')+ (Exponent)?
    |   ('0'..'9')+ Exponent
    |   ('0'..'9')+ (Exponent)?
    ;

protected
Exponent
    :   ('e'|'E') ('+'|'-')? ('0'..'9')+ ;

Now, I understand that very clearly and it *seems* to make sense.  But when I
run it through ANTLR, I get:

opc.g: warning:lexical nondeterminism between rules Constant and INT upon
opc.g:     k==1:'0'..'9'
opc.g:     k==2:<end-of-token>,'0'..'9'
opc.g:     k==3:<end-of-token>,'0'..'9'
opc.g: warning:lexical nondeterminism between rules Constant and DOUBLE upon
opc.g:     k==1:'.','0'..'9'
opc.g:     k==2:<end-of-token>,'.','0'..'9','E','e'
opc.g:     k==3:<end-of-token>,'+','-','.','0'..'9','E','e'
opc.g: warning:lexical nondeterminism between rules INT and DOUBLE upon
opc.g:     k==1:'0'..'9'
opc.g:     k==2:<end-of-token>,'0'..'9'
opc.g:     k==3:<end-of-token>,'0'..'9'
opc.g:181: warning:lexical nondeterminism between alts 1 and 2 of block upon
opc.g:181:     k==1:'0'..'9'
opc.g:181:     k==2:<end-of-token>,'0'..'9'
opc.g:181:     k==3:<end-of-token>,'0'..'9'
opc.g:214: warning:lexical nondeterminism between alts 1 and 3 of block upon
opc.g:214:     k==1:'0'..'9'
opc.g:214:     k==2:'0'..'9'
opc.g:214:     k==3:'0'..'9','E','e'
opc.g:214: warning:lexical nondeterminism between alts 1 and 4 of block upon
opc.g:214:     k==1:'0'..'9'
opc.g:214:     k==2:'0'..'9'
opc.g:214:     k==3:<end-of-token>,'0'..'9','E','e'
opc.g:214: warning:lexical nondeterminism between alts 3 and 4 of block upon
opc.g:214:     k==1:'0'..'9'
opc.g:214:     k==2:'0'..'9','E','e'
opc.g:214:     k==3:'+','-','0'..'9','E','e'

I see that and it also makes sense, since ANTLR is kind enough to tell me just
where the lexer will get confused.

I think my troubles stem from prior experience with tools like bison/flex and
now working with a recursive decent parser like ANTLR.  There's something I'm
just not "getting"...  For example, maybe these are warnings I don't need to
worry about, not unlike the occasional shift/reduce conflict one might
encounter with bison?  Or maybe I *do* need to worry about them...

I would greatly appreceiate any help in clearing up my confusion.  I'm sure
it's not that difficult a task.  Thanks again.

-- 
--John Gruenenfelder    Research Assistant, UMass Amherst student
                        Systems Manager, MKS Imaging Technology, LLC.
Try Weasel Reader for PalmOS  --  http://gutenpalm.sf.net
"This is the most fun I've had without being drenched in the blood
of my enemies!"
        --Sam of Sam & Max