[antlr-interest] Nondeterminism problem

Allan Odgaard allan.odgaard at gmail.com
Tue Oct 17 03:50:59 PDT 2006


Hi there,

I have this lexer:

    class format_lexer extends Lexer;
    options {
       k = 2;
       charVocabulary = '\u0001'..'\u00FF';
    }

    MARK:             'A' 'B';

    protected
    ESCAPED:          '\\' '\u0001'..'\u00FF';

    protected
    OTHER:            ~('A'|'\\');

    protected
    NOT_MARK:         'A' ~('B');

    ALL:              ESCAPED | OTHER | NOT_MARK;


When I run it through ANTLR I get this warning:

    warning:lexical nondeterminism between rules MARK and ALL upon
       k==1:'A'
       k==2:'B'

I don't really understand the problem here, so I'd hoped someone could help
me.

If I inspect the code generated for ALL then it looks like this:

    (LA(1) >= '\u0001' && LA(1) <= '\u00ff') && true

I can see how that match is ambiguous with the one for MARK:

    LA(1) == 'A' && LA(2) == 'B'

What surprises me is that ANTLR generates such broad match for ALL. It seems
it takes the union of all the LA(1) matches for the branches, then &&s that
with the union of all the LA(2)'s. Thus actually matching broader than what
the branches would handle.

Is this understanding correct? Is there a simple workaround for the above
lexer, or would it need to be resolved in the parser?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20061017/9a9cb5d9/attachment-0001.html 


More information about the antlr-interest mailing list