[antlr-interest] Nondeterminism problem

Kay Roepke kroepke at classdump.org
Tue Oct 17 04:18:10 PDT 2006


Hi Allan,


On Oct 17, 2006, at 12:50 PM, Allan Odgaard wrote:

> When I run it through ANTLR I get this warning:
>
>    warning:lexical nondeterminism between rules MARK and ALL upon
>       k==1:'A'
>       k==2:'B'
>
> I don't really understand the problem here, so I'd hoped someone  
> could help
> me.
>
> If I inspect the code generated for ALL then it looks like this:
>
>    (LA(1) >= '\u0001' && LA(1) <= '\u00ff') && true
>
> I can see how that match is ambiguous with the one for MARK:
>
>    LA(1) == 'A' && LA(2) == 'B'
>
> What surprises me is that ANTLR generates such broad match for ALL.  
> It seems
> it takes the union of all the LA(1) matches for the branches, then  
> &&s that
> with the union of all the LA(2)'s. Thus actually matching broader  
> than what
> the branches would handle.

Yes, it is called linear approximate lookahead. You've hit the issue  
with it, where it collapses the FIRST sets to make it linear instead
of exponential.
See <http://www.antlr.org/doc/ 
glossary.html#Linear_approximate_lookahead> and <http:// 
seclib.blogspot.com/2005/11/linear-approximate-lookahead.html>
for an explanation.

> Is this understanding correct? Is there a simple workaround for the  
> above
> lexer, or would it need to be resolved in the parser?

You might have to reorganize your grammar and maybe set the correct  
token types manually in actions.
You'll be pleased to hear that ANTLR v3 doesn't do linear approximate  
lookahead anymore and will generate the correct LL(2) code for this.

         int LA2_0 = input.LA(1);
         if ( (LA2_0=='A') ) {
             int LA2_1 = input.LA(2);
             if ( (LA2_1=='B') ) {
                 alt2=1;
             }
             else if ( ((LA2_1>='\u0000' && LA2_1<='A')||(LA2_1>='C'  
&& LA2_1<='\uFFFE')) ) {
                 alt2=2;
             }

You might also like to hear that ANTLR v3 sports an Objective-C  
target (hint, hint...;))

HTH,

-k

-- 
Kay Röpke <kroepke at classdump.org>
classdump Software
Key fingerprint = A849 0F2C C322 4022 379E  8661 7E1B FE0D 4CD2 A6D0





More information about the antlr-interest mailing list