[antlr-interest] greedy subrule option idiom

Junkman j at junkwallah.org
Fri May 28 11:21:29 PDT 2010


Here is another variation of the grammar:

------------------

grammar Test;

fragment
CHAR	:	'\u0000'..'#' | '$'..'\uffff' ;

STRING	:	'##' ( options {greedy=false;} : CHAR )* '##' ;

stmt	:	
	( . )+
	;


--------------------

This generates grammar check error just like the one in my previous post
(attached at the bottom).

The error goes away if I pull the character '#' out of CHAR and inline
it into STRING with '|' operator next to CHAR like this:

--------------------

grammar Test;

fragment
CHAR	:	'\u0000'..'"' | '$'..'\uffff' ;

STRING	:	'##' ( options {greedy=false;} : CHAR | '#' )* '##' ;

stmt	:	
	( . )+
	;

---------------------

Looks like the DFA needs '#' at the top level of the greedy subrule
because the character also match the beginning of the exit branch (and
hence require more lookahead to decide).

I'd like to know if this is known (and consistent) behavior.  Or perhaps
I'm way off because I missed something very basic in the grammars above.

I did a quick search of the list archive using the MarkMail link Jim
provided, and did find a recent thread on non-greedy loop, but it
concerns suggestion for v4 and not sure it's directly applicable to this
question.

Sorry if it seems like I'm beating a dead horse.  Being a noob makes me
want to dot every i and j twice.

Junkman wrote:
> Hello,
> 
> The following grammar generates error:
> 
> ---------------------
> grammar Test;
> 
> fragment
> CHAR	:	. ;
> 
> STRING	:	'"' ( options {greedy=false;} : CHAR )* '"' ;
> 
> stmt	:	
> 	( . )+
> 	;
> 
> ---------------------
> 
> The error message generated by "Check Grammar" option of Antlrwork (1.4) is:
> 
> [15:34:52] error(201): Test.g:6:47: The following alternatives can never
> be matched: 2
> 
> I think it means it cannot exit the non-greedy subrule (of the lexer
> rule STRING).
> 
> If I substitute "." directly for "CHAR", no error.
> 
> Is this the expected behavior?  Is there a problem with the grammar
> given above?
> 
> Thanks for any insight/assistance.
> 
> J
> 
> Junkman wrote:
>> Hello,
>>
>> Following is a lexer rule to match quoted string that allows backslash
>> escape sequence.
>>
>>
>> STRING
>> 	: 	 '"' ( options {greedy=false;} : ( ~ '\\' | '\\' . ) )* '"'
>> 	;
>>
>>
>> It seems to work.  But if you put the '*' operator inside the subrule
>> like this:
>>
>>
>> STRING
>> 	: 	 '"' ( options {greedy=false;} : ( ~ '\\' | '\\' . )* ) '"'
>> 	;
>>
>>
>> It eats up everything to EOF.
>>
>> It's as if the greedy option applies to the ((subrule)*) instead of the
>> subrule itself, and only if the subrule is suffixed with '*' operator
>> (or with '+') externally (as in (subrule)*).
>>
>> To my eyes, the second version seems the "correct" one.
>>
>> Thoughts?
>>
>> J
>>
> 
> 



More information about the antlr-interest mailing list