[antlr-interest] Beginner lexing question.
Gavin Lambert
antlr at mirality.co.nz
Sun Aug 3 13:32:00 PDT 2008
At 05:19 4/08/2008, Peter C. Chapin wrote:
>UNARY_OPERATOR
> : ('&' | '*' | '+' | '-' | '~' | '!');
>
>Now when I try to parse '*X' I get a NoViableAltException.
However,
>if I replace UNARY_OPERATOR in the unary_expression rule with an
>explicit '*', things work (well... not the other unary
operators, of
>course). That is:
>
>unary_expression
> : ... various irrelevant alternatives
> | '*' cast_expression;
This is the problem. By using '*' in a parser rule you have
created an implicit token similar to this:
T42 : '*';
Now your lexer is ambiguous between T42 and UNARY_OPERATOR -- so
on seeing a '*' as input, ANTLR will generate one or the other
(depending on the order it sees the rules in) and the other will
never happen, which will break your parser rules.
Ideally, when starting out with ANTLR you should avoid composite
grammars (or at least avoid using quoted literals in parser
rules), since they lead to this kind of trap all too easily.
Probably the best thing to do to resolve this specific problem is
to make separate lexer rules for each operator symbol and then
change UNARY_OPERATOR into a parser rule. Another useful rule of
thumb is that where ambiguity exists, try to avoid assigning
semantic meaning in the lexer. (Sometimes it can't be avoided due
to whitespace-handling issues, but that makes things complicated.)
More information about the antlr-interest
mailing list