[antlr-interest] Question with greedy

Wed Sep 23 15:29:32 PDT 2009

On 09/23/2009 02:35 PM, Andreas Volz wrote:
> Hello,
>
> I wrote this grammar:
>
> startrule
> 	: (property comment property)*
> 	;
> comment
> 	: COMMENT { printf("Comment: \%s\n", $COMMENT.text->chars); }
> 	;
> 	
> COMMENT
>   	:  '/*' ( options {greedy=false;} : . )* '*/'
>   	;
> 	
> property
> 	: TOKEN { printf("Property: \%s\n", $TOKEN.text->chars);}
>
> TOKEN
> 	: (ALPHA | DIGIT)+
>
> fragment DIGIT  	
> 	: '0'..'9'
> 	;
> 	
> fragment ALPHA
> 	: 'a'..'z' | 'A'..'Z' |'@'|'.'| ' '
> 	;
>
> The input is:
>
> This is a test /* with a comment */ in the middle
> This is a test /* with a comment */ in the middle
> This is a test /* with a comment */ in the middle
>
> The result looks good, but some errors are print out:
>
> test.txt(1) : lexer error 3 :
> 	 at offset 49, near char(0XA) :
> 	
> This is a test /* w
> test.txt(2) : lexer error 3 :
> 	 at offset 50, near char(0XA) :
> 	
> This is a test /* w
> test.txt(3) : lexer error 3 :
> 	 at offset 50, near char(0XA) :
> 	
>
> Property: This is a test
> Comment: /* with a comment */
> Property:  in the middle
> Property: This is a test
> Comment: /* with a comment */
> Property:  in the middle
> Property: This is a test
> Comment: /* with a comment */
> Property:  in the middle
>
> BTW: The line ending in this file is 0x0A.
>
> Could anyone explain this error and how to prevent it?
>    

Well, you have not specified to the lexer what it should do with those 
chars (I assume that this is C from your code above):

NL : ('\r' | '\n')+ { $channel=HIDDEN; } ;
ANY : . { SKIP(); } ; // Always make this the very last lexer rule
> Second question: How do I not include the '/*' and '*/' tags in the
> comment match?
>    
Cheat (from the top of my head):

COMMENT : '/*' { $start = $pos; } ( options {greedy=false;} : .)* { 
EMIT(); } '*/' ;

I think it is $pos, but you might need to use GETCHARPOSTIONINLINE() 
rather than $pos.

Jim