[antlr-interest] Re: strange lexical nondeterminism
Lubos Vnuk
lubos.vnuk at rts.at
Mon Mar 8 09:38:46 PST 2004
Simon,
Your problem is caused by ANTLR's linear approximation of the look-
ahead. This means that ANTLR merges prediction sets of all the
alternative branches.
For REL_TIME it will be:
REL_TIME_LAset(1)={'0'..'9'} U {'0'..'2'}={'0'..'9'}
REL_TIME_LAset(2)={'0'..'9'} U {'0'..'9'}={'0'..'9'}
REL_TIME_LAset(3)={'0'..'9'} U {':'} ={'0'..'9',':'}
REL_TIME_LAset(4)={'.'} U {'0'..'5'}={'.','0'..'5'}
REL_TIME_LAset(5)={'0'..'2'} U {'0'..'9'}={'0'..'9'}
REL_TIME_LAset(6)={'0'..'9'} U {EOF} ={'0'..'9',EOF}
REL_TIME_LAset(7)={':'} U {EOF} ={':',EOF}
REL_TIME_LAset(8)={'0'..'5'} U {EOF} ={'0'..'5',EOF}
REL_TIME_LAset(9)={'0'..'9'} U {EOF} ={'0'..'9',EOF}
REL_TIME_LAset(10)={EOF} U {EOF} ={EOF}
...
For the INT rule it is (no linear approximation here):
INT_LAset(1)={'0'..'9'}
INT_LAset(2)={EOF,'0'..'9'}
INT_LAset(3)={EOF,'0'..'9'}
INT_LAset(4)={EOF,'0'..'9'}
INT_LAset(5)={EOF,'0'..'9'}
INT_LAset(6)={EOF,'0'..'9'}
LAsINT_et(7)={EOF,'0'..'9'}
INT_LAset(8)={EOF,'0'..'9'}
INT_LAset(9)={EOF,'0'..'9'}
INT_LAset(10)={EOF,'0'..'9'}
...
Now ANTLR will try to intersect these sets up to the maximum look-
ahead level in order to find two disjoint sets that it can use to
unambiguously decide on the rule.
REL_TIME_LAset(1) ^ INT_LAset(1)= {'0'..'9'}
REL_TIME_LAset(2) ^ INT_LAset(2)= {'0'..'9'}
REL_TIME_LAset(3) ^ INT_LAset(3)= {'0'..'9'}
REL_TIME_LAset(4) ^ INT_LAset(4)= {'0'..'5'}
REL_TIME_LAset(5) ^ INT_LAset(5)= {'0'..'9'}
REL_TIME_LAset(6) ^ INT_LAset(6)= {EOF,'0'..'9'}
REL_TIME_LAset(7) ^ INT_LAset(7)= {EOF}
REL_TIME_LAset(8) ^ INT_LAset(8)= {EOF'0'..'5'}
REL_TIME_LAset(9) ^ INT_LAset(9)= {EOF,'0'..'9'}
REL_TIME_LAset(10)^ INT_LAset(10)={EOF}
...
HTH,
Lubos.
--- In antlr-interest at yahoogroups.com, "Simon Kellett"
<skellett at a...> wrote:
> Two of the rules in my Lexer are:
>
> // optional 3 digit day, followed by hh:mm
> REL_TIME: ('0'..'9' '0'..'9' '0'..'9' '.')? '0'..'2' '0'..'9' ':'
> '0'..'5' '0'..'9';
> INT: ('0'..'9')+;
>
> But I get the following warning (k=10):
>
> lmp_grammer.g: warning:lexical nondeterminism between rules REL_TIME
> and INT upon
> lmp_grammer.g: k==1:'0'..'9'
> lmp_grammer.g: k==2:'0'..'9'
> lmp_grammer.g: k==3:'0'..'9'
> lmp_grammer.g: k==4:'0'..'5'
> lmp_grammer.g: k==5:'0'..'9'
> lmp_grammer.g: k==6:<end-of-token>,'0'..'9'
> lmp_grammer.g: k==7:<end-of-token>
> lmp_grammer.g: k==8:<end-of-token>,'0'..'5'
> lmp_grammer.g: k==9:<end-of-token>,'0'..'9'
> lmp_grammer.g: k==10:<end-of-token>
>
> If I remove the optional "?" then the warning disappears.
>
> Surely the Lexer (with k>3) should always know whether it is dealing
> with a REL_TIME or an INT ? (if the 4th char is a '.' or the 3rd is
a
> ':' then the token must be a REL_TIME).
>
> I could understand the nondeterminism if k=3 !!
>
> But the generated code is OK: the REL_TIME is checked for first, and
> only later the INT is check for.
>
> TIA, Simon
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list