[antlr-interest] Parsing this ambiguous grammar
Jim Idle
jimi at temporal-wave.com
Fri Jan 27 12:25:14 PST 2012
Why stackoverflow? The answer is never indexed in markmail!
grammar jim;
test: (
pitch
| id
)+
EOF
;
pitch : PITCH;
id: ID ;
fragment ID :;
PITCH : ('A'|'a'|'B'|'b'|'C'|'c')
(
(' '|'\t'|'#')=> '#'?
| ('a'..'z' | 'A'..'Z') ('0'..'9' | 'a'..'z' | 'A'..'Z' )+ //
You can't have WS in ID
{ $type = ID; }
)
| ('d'..'z' | 'D'..'Z') ('0'..'9' | 'a'..'z' | 'A'..'Z' )*
{ $type = ID; }
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {skip();}
;
However, I am taking your grammar at its word that A B or C must always be
PITCH. I also assume that you cannot have ' ' in your ID or you have no
way at all to disambiguate except for context. However, this may come from
a hand crafted context sensitive parser made by a musician not a
programmer and so you will have to jump through hoops in the parser to try
and make it work.
Jim
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Bart Kiers
> Sent: Friday, January 27, 2012 10:49 AM
> To: antlr-interest at antlr.org interest
> Subject: Re: [antlr-interest] Parsing this ambiguous grammar
>
> On Fri, Jan 27, 2012 at 2:48 AM, Gerald Gutierrez <
> gerald.gutierrez at gmail.com> wrote:
>
> > ...
> > Essentially, I've got two tokens defined:
> >
> > ID : ('a'..'z' | 'A'..'Z') ('0'..'9' | 'a'..'z' | 'A'..'Z' | '
> ')*;
> >
> > PITCH
> > : (('A'|'a') '#'?)
> > | (('B'|'b') '#'?)
> > | (('C'|'c') '#'?);
> >
> > Obviously, the letter "A" would be an ambiguity.
> >
>
> No matter what the parser "asks" of the lexer, the lexer will simply
> return the longest match. And in case of a tie, it returns the match
> (token) that is defined first. So in your case, "A", "B" and "C"
> (regardless of case) will always be tokenized as an ID (assuming ID is
> defined before PITCH as you posted in your example). I wouldn't call it
> ambiguous.
>
> Also see:
> http://stackoverflow.com/questions/9023015/proper-way-to-resolve-antlr-
> lexer-rule-ambiguities
>
> Bart.
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address
More information about the antlr-interest
mailing list