[antlr-interest] Parsing this ambiguous grammar

Jim Idle jimi at temporal-wave.com
Fri Jan 27 12:25:14 PST 2012


Why stackoverflow? The answer is never indexed in markmail!

grammar jim;

test: (
		  pitch
		| id
	  )+
	  EOF
;

pitch : PITCH;
id: ID ;

fragment ID :;
PITCH : ('A'|'a'|'B'|'b'|'C'|'c')
        (
            (' '|'\t'|'#')=> '#'?
          | ('a'..'z' | 'A'..'Z') ('0'..'9' | 'a'..'z' | 'A'..'Z' )+ //
You can't have WS in ID
            { $type = ID; }
        )
      | ('d'..'z' | 'D'..'Z') ('0'..'9' | 'a'..'z' | 'A'..'Z' )*
            { $type = ID; }
;


WS  :   ( ' '
        | '\t'
        | '\r'
        | '\n'
        ) {skip();}
    ;

However, I am taking your grammar at its word that A B or C must always be
PITCH. I also assume that you cannot have ' ' in your ID or you have no
way at all to disambiguate except for context. However, this may come from
a hand crafted context sensitive parser made by a musician not a
programmer and so you will have to jump through hoops in the parser to try
and make it work.

Jim




> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Bart Kiers
> Sent: Friday, January 27, 2012 10:49 AM
> To: antlr-interest at antlr.org interest
> Subject: Re: [antlr-interest] Parsing this ambiguous grammar
>
> On Fri, Jan 27, 2012 at 2:48 AM, Gerald Gutierrez <
> gerald.gutierrez at gmail.com> wrote:
>
> > ...
> > Essentially, I've got two tokens defined:
> >
> > ID  :   ('a'..'z' | 'A'..'Z') ('0'..'9' | 'a'..'z' | 'A'..'Z' | '
> ')*;
> >
> > PITCH
> >    :   (('A'|'a') '#'?)
> >    |   (('B'|'b') '#'?)
> >    |   (('C'|'c') '#'?);
> >
> > Obviously, the letter "A" would be an ambiguity.
> >
>
> No matter what the parser "asks" of the lexer, the lexer will simply
> return the longest match. And in case of a tie, it returns the match
> (token) that is defined first. So in your case, "A", "B" and "C"
> (regardless of case) will always be tokenized as an ID (assuming ID is
> defined before PITCH as you posted in your example). I wouldn't call it
> ambiguous.
>
> Also see:
> http://stackoverflow.com/questions/9023015/proper-way-to-resolve-antlr-
> lexer-rule-ambiguities
>
> Bart.
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list