[antlr-interest] Ambiguous lexer rules

Sun Jul 3 07:55:23 PDT 2011

I see your problem. The lexer is independent of the parser -- it doesn't matter what the parser is expecting (i.e. even though the parser is expecting a '[' and then STRING+, the lexer just sees a sequence of characters '[aaa' that are to be grouped into an XSTRING token rather than a '[' token followed by a STRING token of value 'aaa')...the lexer will just produce tokens based on the characters coming in and the lexer rules (and based on other criteria such as choosing the longest match, the first lexer rule that appears, etc., of which I can't remember all the details).

One possible solution could be to use semantic predicates. There's an example of this athttp://www.antlr.org/wiki/display/ANTLR3/1.+Lexer  where in the lexing of XML, a tagMode boolean variable is set whenever opening and closing tags ('<' and'>') are seen. Other lexer rules can then have (gated) semantic predicates which cause themselves to be enabled or disabled depending on whether the predicate (tagMode) was true or false, respectively.

You could try the same thing for your lexer rules for '[' and ']' (currently it doesn't look like you have explicit lexer rules for '[' and ']' -- you would need them if you use this technique), and then ensure the XSTRING rule has a semantic predicate of {!inArray}?=>  (assuming your boolean was called 'inArray'). One of the last chapters in the ANTLR book has some good stuff on semantic predicates (in the context of parsers, but you can use them in lexers too).

An alternative *workaround* to your problem would be to change your XSTRING lexer rule to exclude the ASCII characters '[' and ']'. I think that would work.

Hope that helps,
Stephen