[antlr-interest] Why does the unused rule effect parser behaviour?

Tue Jan 10 02:59:27 PST 2012

Ok, for anyone else who encounters the same thing:
When I use characters directly in parser rules such as 'a', they end up as
tokens. Even though 'a' is a character that is normally covered by lower
case token, it exists on its own, and parser matches it, providing an
unexpected token type for the rule that is trying to use lower case token.
Lesson learned: do not use characters in parser rules, use tokens..

Regards
Seref

On Tue, Jan 10, 2012 at 10:03 AM, Seref Arikan <
serefarikan at kurumsalteknoloji.com> wrote:

> Greetings,
> The simple grammar below should be able to parse simple input: aa
> When id_char_minus_t is commented out it can. When it is included in the
> grammar, even if it is not used at all, it can't.
>
> I really don't understand what is going on here. Even more weird thing is:
> when id_char_minus_t is included, it can parse input: dd
> I'm clearly lost here, so I would really appreciate the input. Why is
> Anltr doing this?
>
> grammar RecursionTests;
>
>
> rul    :  alphanumeric* ;
>
> //Identifier = {LetterMinusA}{IdCharMinusT}?{IdChar}* |
> 'a''t'?(({letter}|'_')*|{LetterMinusT}{Alphanumeric}*)
> /*
>
> identifier
>     :    ( letter_minus_a (letter_minus_t)? (id_char)* )
>     |     ( 'a' ('t')? ( ( (Letter_lowercase | Letter_uppercase |
> Underscore)* ) | (letter_minus_t (alphanumeric)*) ))
>     ;
> */
>
> letter_minus_a
>     :    {input.LT(1).getText().contains("a") == false &&
> input.LT(1).getText().contains("A") == false}?(Letter_lowercase |
> Letter_uppercase)
>     ;
>
>
> letter_minus_t
>     :    {input.LT(1).getText().contains("t") == false &&
> input.LT(1).getText().contains("T") == false}?(Letter_lowercase |
> Letter_uppercase)
>     ;
>
>
> id_char_minus_t
>     :    'a'..'s'| 'u'..'z' | 'A'..'S' | 'U'..'Z' | Digit | '_'
>     ;
>
>
> letter_or_underscore
>     :    Letter_lowercase | Letter_uppercase | Underscore
>     ;
>
> id_char
>     : Letter_lowercase | Letter_uppercase | Digit | Underscore
>     ;
>
>
> alphanumeric
>     :    Letter_lowercase | Letter_uppercase | Digit
>     ;
>
>
> Digit
>     :    '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
>     ;
>
> Letter_uppercase
>     :    'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' |
> 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X'
> | 'Y' | 'Z'
>     ;
>
> Letter_lowercase
>     :    'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' |
> 'l' | 'm' | 'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x'
> | 'y' | 'z'
>     ;
>
> Underscore
>     :    '_'
>     ;
>
>
>
>
>