[antlr-interest] Parsing this ambiguous grammar

Fri Jan 27 10:48:39 PST 2012

On Fri, Jan 27, 2012 at 2:48 AM, Gerald Gutierrez <
gerald.gutierrez at gmail.com> wrote:

> ...
> Essentially, I've got two tokens defined:
>
> ID  :   ('a'..'z' | 'A'..'Z') ('0'..'9' | 'a'..'z' | 'A'..'Z' | ' ')*;
>
> PITCH
>    :   (('A'|'a') '#'?)
>    |   (('B'|'b') '#'?)
>    |   (('C'|'c') '#'?);
>
> Obviously, the letter "A" would be an ambiguity.
>

No matter what the parser "asks" of the lexer, the lexer will simply return
the longest match. And in case of a tie, it returns the match (token) that
is defined first. So in your case, "A", "B" and "C" (regardless of case)
will always be tokenized as an ID (assuming ID is defined before PITCH as
you posted in your example). I wouldn't call it ambiguous.

Also see:
http://stackoverflow.com/questions/9023015/proper-way-to-resolve-antlr-lexer-rule-ambiguities

Bart.