[antlr-interest] Parsing this ambiguous grammar

Bart Kiers bkiers at gmail.com
Fri Jan 27 10:48:39 PST 2012


On Fri, Jan 27, 2012 at 2:48 AM, Gerald Gutierrez <
gerald.gutierrez at gmail.com> wrote:

> ...
> Essentially, I've got two tokens defined:
>
> ID  :   ('a'..'z' | 'A'..'Z') ('0'..'9' | 'a'..'z' | 'A'..'Z' | ' ')*;
>
> PITCH
>    :   (('A'|'a') '#'?)
>    |   (('B'|'b') '#'?)
>    |   (('C'|'c') '#'?);
>
> Obviously, the letter "A" would be an ambiguity.
>

No matter what the parser "asks" of the lexer, the lexer will simply return
the longest match. And in case of a tie, it returns the match (token) that
is defined first. So in your case, "A", "B" and "C" (regardless of case)
will always be tokenized as an ID (assuming ID is defined before PITCH as
you posted in your example). I wouldn't call it ambiguous.

Also see:
http://stackoverflow.com/questions/9023015/proper-way-to-resolve-antlr-lexer-rule-ambiguities

Bart.


More information about the antlr-interest mailing list