[antlr-interest] Beginner lexing question.

Gavin Lambert antlr at mirality.co.nz
Sun Aug 3 13:32:00 PDT 2008


At 05:19 4/08/2008, Peter C. Chapin wrote:
 >UNARY_OPERATOR
 >    :   ('&' | '*' | '+' | '-' | '~' | '!');
 >
 >Now when I try to parse '*X' I get a NoViableAltException. 
However,
 >if I replace UNARY_OPERATOR in the unary_expression rule with an 

 >explicit '*', things work (well... not the other unary 
operators, of
 >course). That is:
 >
 >unary_expression
 >    :   ... various irrelevant alternatives
 >    |   '*' cast_expression;

This is the problem.  By using '*' in a parser rule you have 
created an implicit token similar to this:
   T42 : '*';

Now your lexer is ambiguous between T42 and UNARY_OPERATOR -- so 
on seeing a '*' as input, ANTLR will generate one or the other 
(depending on the order it sees the rules in) and the other will 
never happen, which will break your parser rules.

Ideally, when starting out with ANTLR you should avoid composite 
grammars (or at least avoid using quoted literals in parser 
rules), since they lead to this kind of trap all too easily.

Probably the best thing to do to resolve this specific problem is 
to make separate lexer rules for each operator symbol and then 
change UNARY_OPERATOR into a parser rule.  Another useful rule of 
thumb is that where ambiguity exists, try to avoid assigning 
semantic meaning in the lexer.  (Sometimes it can't be avoided due 
to whitespace-handling issues, but that makes things complicated.)



More information about the antlr-interest mailing list