[antlr-interest] Beginner lexing question.
Peter C. Chapin
pcc482719 at gmail.com
Sun Aug 3 10:19:45 PDT 2008
I'm building a parser for a C-like language and I've encountered an
issue that I think has something to do with the order in which ANTLR
tries to match rules. This situation is this...
In my expression grammar I have a rule
unary_expression
: ... various irrelevant alternatives
| UNARY_OPERATOR cast_expression;
Where near the bottom of the grammar file I have
UNARY_OPERATOR
: ('&' | '*' | '+' | '-' | '~' | '!');
Now when I try to parse '*X' I get a NoViableAltException. However, if I
replace UNARY_OPERATOR in the unary_expression rule with an explicit
'*', things work (well... not the other unary operators, of course).
That is:
unary_expression
: ... various irrelevant alternatives
| '*' cast_expression;
I have explicit mention of '*' elsewhere in my grammar (in the rule for
multiplicative expressions) so I thought that perhaps the lexer was
seeing a '*' on the input and returning the token used in the multiply
rule instead of a UNARY_OPERATOR token. Note that the multiply rule
appears above the definition of UNARY_OPERATOR in my grammar file.
However, if I change the definition of UNARY_OPERATOR to just
UNARY_OPERATOR
: '*';
It works! I'm at a loss to understand why including additional
alternatives for UNARY_OPERATOR would cause a problem during the parse
of '*X'. As a final test I put all the necessary alternatives directly
in the unary_expression rule like this:
unary_expression
: ... various irrelevant alternatives
| ('&' | '*' | '+' | '-' | '~' | '!') cast_expression;
This works fine as well (now I get a warning about the UNARY_OPERATOR
token definition being unreachable, but I understand that). In short
there is something about the way the lexer rules work that I'm not
getting. I'm hoping someone here might be able to shed some light on
this behavior.
Thanks in advance!
Peter
More information about the antlr-interest
mailing list