[antlr-interest] Trouble with nondeterminism

Spálený Ivo Ivo.Spaleny at bsp.cz
Mon Aug 28 18:17:46 PDT 2006


Hi,

Sub-rule

  ('0'..'9')+ (Exponent)?

can be splitted into:

  ('0' | '1'..'9' ('0'..'9')*) // duplicity of INT token
| ('0'..'9')+ Exponent         // duplicity of an alternative DOUBLE branch 
| '0' ('0'..'9')+              // the unique piece of information in this sub-rule

In ANTLR point of view, nondeterministic input probably results in deterministic output. ANTLR disables alternatives. But isn't it better to be sure; if "1" is DOUBLE or INT token finally?

Best regards,

Ivo Spaleny


-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of John Gruenenfelder
Sent: Tuesday, August 29, 2006 1:38 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Trouble with nondeterminism

I keeping with my "starting simple" plan, I've been adding bits to my simple
parser as I get the hang of ANTLR.  The many examples on the Net are also
helping a great deal.

But...  I'm having trouble trying to have my lexer recognize both int and
double number types.  I've tried a few different examples all ending in
similar ANTLR messages.  It looks like this:

Constant
    :   INT
    |   DOUBLE
    ;

INT
    :   ('0' | '1'..'9' ('0'..'9')*) ;

DOUBLE
    :   ('0'..'9')+ '.' ('0'..'9')* (Exponent)?
    |   '.' ('0'..'9')+ (Exponent)?
    |   ('0'..'9')+ Exponent
    |   ('0'..'9')+ (Exponent)?
    ;

protected
Exponent
    :   ('e'|'E') ('+'|'-')? ('0'..'9')+ ;


Now, I understand that very clearly and it *seems* to make sense.  But when I
run it through ANTLR, I get:

opc.g: warning:lexical nondeterminism between rules Constant and INT upon
opc.g:     k==1:'0'..'9'
opc.g:     k==2:<end-of-token>,'0'..'9'
opc.g:     k==3:<end-of-token>,'0'..'9'
opc.g: warning:lexical nondeterminism between rules Constant and DOUBLE upon
opc.g:     k==1:'.','0'..'9'
opc.g:     k==2:<end-of-token>,'.','0'..'9','E','e'
opc.g:     k==3:<end-of-token>,'+','-','.','0'..'9','E','e'
opc.g: warning:lexical nondeterminism between rules INT and DOUBLE upon
opc.g:     k==1:'0'..'9'
opc.g:     k==2:<end-of-token>,'0'..'9'
opc.g:     k==3:<end-of-token>,'0'..'9'
opc.g:181: warning:lexical nondeterminism between alts 1 and 2 of block upon
opc.g:181:     k==1:'0'..'9'
opc.g:181:     k==2:<end-of-token>,'0'..'9'
opc.g:181:     k==3:<end-of-token>,'0'..'9'
opc.g:214: warning:lexical nondeterminism between alts 1 and 3 of block upon
opc.g:214:     k==1:'0'..'9'
opc.g:214:     k==2:'0'..'9'
opc.g:214:     k==3:'0'..'9','E','e'
opc.g:214: warning:lexical nondeterminism between alts 1 and 4 of block upon
opc.g:214:     k==1:'0'..'9'
opc.g:214:     k==2:'0'..'9'
opc.g:214:     k==3:<end-of-token>,'0'..'9','E','e'
opc.g:214: warning:lexical nondeterminism between alts 3 and 4 of block upon
opc.g:214:     k==1:'0'..'9'
opc.g:214:     k==2:'0'..'9','E','e'
opc.g:214:     k==3:'+','-','0'..'9','E','e'

I see that and it also makes sense, since ANTLR is kind enough to tell me just
where the lexer will get confused.

I think my troubles stem from prior experience with tools like bison/flex and
now working with a recursive decent parser like ANTLR.  There's something I'm
just not "getting"...  For example, maybe these are warnings I don't need to
worry about, not unlike the occasional shift/reduce conflict one might
encounter with bison?  Or maybe I *do* need to worry about them...

I would greatly appreceiate any help in clearing up my confusion.  I'm sure
it's not that difficult a task.  Thanks again.


-- 
--John Gruenenfelder    Research Assistant, UMass Amherst student
                        Systems Manager, MKS Imaging Technology, LLC.
Try Weasel Reader for PalmOS  --  http://gutenpalm.sf.net
"This is the most fun I've had without being drenched in the blood
of my enemies!"
        --Sam of Sam & Max


More information about the antlr-interest mailing list