[antlr-interest] Parsing this ambiguous grammar
Bart Kiers
bkiers at gmail.com
Fri Jan 27 10:48:39 PST 2012
On Fri, Jan 27, 2012 at 2:48 AM, Gerald Gutierrez <
gerald.gutierrez at gmail.com> wrote:
> ...
> Essentially, I've got two tokens defined:
>
> ID : ('a'..'z' | 'A'..'Z') ('0'..'9' | 'a'..'z' | 'A'..'Z' | ' ')*;
>
> PITCH
> : (('A'|'a') '#'?)
> | (('B'|'b') '#'?)
> | (('C'|'c') '#'?);
>
> Obviously, the letter "A" would be an ambiguity.
>
No matter what the parser "asks" of the lexer, the lexer will simply return
the longest match. And in case of a tie, it returns the match (token) that
is defined first. So in your case, "A", "B" and "C" (regardless of case)
will always be tokenized as an ID (assuming ID is defined before PITCH as
you posted in your example). I wouldn't call it ambiguous.
Also see:
http://stackoverflow.com/questions/9023015/proper-way-to-resolve-antlr-lexer-rule-ambiguities
Bart.
More information about the antlr-interest
mailing list