[antlr-interest] Trouble with nondeterminism
Spálený Ivo
Ivo.Spaleny at bsp.cz
Mon Aug 28 18:17:46 PDT 2006
Hi,
Sub-rule
('0'..'9')+ (Exponent)?
can be splitted into:
('0' | '1'..'9' ('0'..'9')*) // duplicity of INT token
| ('0'..'9')+ Exponent // duplicity of an alternative DOUBLE branch
| '0' ('0'..'9')+ // the unique piece of information in this sub-rule
In ANTLR point of view, nondeterministic input probably results in deterministic output. ANTLR disables alternatives. But isn't it better to be sure; if "1" is DOUBLE or INT token finally?
Best regards,
Ivo Spaleny
-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of John Gruenenfelder
Sent: Tuesday, August 29, 2006 1:38 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Trouble with nondeterminism
I keeping with my "starting simple" plan, I've been adding bits to my simple
parser as I get the hang of ANTLR. The many examples on the Net are also
helping a great deal.
But... I'm having trouble trying to have my lexer recognize both int and
double number types. I've tried a few different examples all ending in
similar ANTLR messages. It looks like this:
Constant
: INT
| DOUBLE
;
INT
: ('0' | '1'..'9' ('0'..'9')*) ;
DOUBLE
: ('0'..'9')+ '.' ('0'..'9')* (Exponent)?
| '.' ('0'..'9')+ (Exponent)?
| ('0'..'9')+ Exponent
| ('0'..'9')+ (Exponent)?
;
protected
Exponent
: ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
Now, I understand that very clearly and it *seems* to make sense. But when I
run it through ANTLR, I get:
opc.g: warning:lexical nondeterminism between rules Constant and INT upon
opc.g: k==1:'0'..'9'
opc.g: k==2:<end-of-token>,'0'..'9'
opc.g: k==3:<end-of-token>,'0'..'9'
opc.g: warning:lexical nondeterminism between rules Constant and DOUBLE upon
opc.g: k==1:'.','0'..'9'
opc.g: k==2:<end-of-token>,'.','0'..'9','E','e'
opc.g: k==3:<end-of-token>,'+','-','.','0'..'9','E','e'
opc.g: warning:lexical nondeterminism between rules INT and DOUBLE upon
opc.g: k==1:'0'..'9'
opc.g: k==2:<end-of-token>,'0'..'9'
opc.g: k==3:<end-of-token>,'0'..'9'
opc.g:181: warning:lexical nondeterminism between alts 1 and 2 of block upon
opc.g:181: k==1:'0'..'9'
opc.g:181: k==2:<end-of-token>,'0'..'9'
opc.g:181: k==3:<end-of-token>,'0'..'9'
opc.g:214: warning:lexical nondeterminism between alts 1 and 3 of block upon
opc.g:214: k==1:'0'..'9'
opc.g:214: k==2:'0'..'9'
opc.g:214: k==3:'0'..'9','E','e'
opc.g:214: warning:lexical nondeterminism between alts 1 and 4 of block upon
opc.g:214: k==1:'0'..'9'
opc.g:214: k==2:'0'..'9'
opc.g:214: k==3:<end-of-token>,'0'..'9','E','e'
opc.g:214: warning:lexical nondeterminism between alts 3 and 4 of block upon
opc.g:214: k==1:'0'..'9'
opc.g:214: k==2:'0'..'9','E','e'
opc.g:214: k==3:'+','-','0'..'9','E','e'
I see that and it also makes sense, since ANTLR is kind enough to tell me just
where the lexer will get confused.
I think my troubles stem from prior experience with tools like bison/flex and
now working with a recursive decent parser like ANTLR. There's something I'm
just not "getting"... For example, maybe these are warnings I don't need to
worry about, not unlike the occasional shift/reduce conflict one might
encounter with bison? Or maybe I *do* need to worry about them...
I would greatly appreceiate any help in clearing up my confusion. I'm sure
it's not that difficult a task. Thanks again.
--
--John Gruenenfelder Research Assistant, UMass Amherst student
Systems Manager, MKS Imaging Technology, LLC.
Try Weasel Reader for PalmOS -- http://gutenpalm.sf.net
"This is the most fun I've had without being drenched in the blood
of my enemies!"
--Sam of Sam & Max
More information about the antlr-interest
mailing list