[antlr-interest] Beginner lexing question.

Johannes Luber jaluber at gmx.de
Sun Aug 3 13:22:50 PDT 2008


Peter C. Chapin schrieb:
> I'm building a parser for a C-like language and I've encountered an 
> issue that I think has something to do with the order in which ANTLR 
> tries to match rules. This situation is this...
> 
> In my expression grammar I have a rule
> 
> unary_expression
>    :   ... various irrelevant alternatives
>    |   UNARY_OPERATOR cast_expression;
> 
> Where near the bottom of the grammar file I have
> 
> UNARY_OPERATOR
>    :   ('&' | '*' | '+' | '-' | '~' | '!');
> 
> Now when I try to parse '*X' I get a NoViableAltException. However, if I 
> replace UNARY_OPERATOR in the unary_expression rule with an explicit 
> '*', things work (well... not the other unary operators, of course). 
> That is:
> 
> unary_expression
>    :   ... various irrelevant alternatives
>    |   '*' cast_expression;
> 
> I have explicit mention of '*' elsewhere in my grammar (in the rule for 
> multiplicative expressions) so I thought that perhaps the lexer was 
> seeing a '*' on the input and returning the token used in the multiply 
> rule instead of a UNARY_OPERATOR token. Note that the multiply rule 
> appears above the definition of UNARY_OPERATOR in my grammar file.
> 
> However, if I change the definition of UNARY_OPERATOR to just
> 
> UNARY_OPERATOR
>    :   '*';
> 
> It works! I'm at a loss to understand why including additional 
> alternatives for UNARY_OPERATOR would cause a problem during the parse 
> of '*X'. As a final test I put all the necessary alternatives directly 
> in the unary_expression rule like this:
> 
> unary_expression
>    :   ... various irrelevant alternatives
>    |   ('&' | '*' | '+' | '-' | '~' | '!') cast_expression;
> 
> 
> This works fine as well (now I get a warning about the UNARY_OPERATOR 
> token definition being unreachable, but I understand that). In short 
> there is something about the way the lexer rules work that I'm not 
> getting. I'm hoping someone here might be able to shed some light on 
> this behavior.
> 
> Thanks in advance!
> 
> Peter
> 
You fell into the trap of combined grammars. Don't use in parser rules 
literals, as those are turned into different lexer rules, even if they 
match the same string. That causes only problems, as you noticed.

Johannes


More information about the antlr-interest mailing list