[antlr-interest] Question with greedy

Wed Sep 23 15:24:50 PDT 2009

Greetings!

On Wed, 2009-09-23 at 23:35 +0200, Andreas Volz wrote:
> Hello,
> 
> I wrote this grammar:
> 
> startrule
> 	: (property comment property)*
> 	;
> comment
> 	: COMMENT { printf("Comment: \%s\n", $COMMENT.text->chars); }
> 	;
> 	
> COMMENT
>  	:  '/*' ( options {greedy=false;} : . )* '*/'
>  	;
> 	
> property
> 	: TOKEN { printf("Property: \%s\n", $TOKEN.text->chars);}
> 
> TOKEN
> 	: (ALPHA | DIGIT)+
> 
> fragment DIGIT  	
> 	: '0'..'9'
> 	;
> 	
> fragment ALPHA
> 	: 'a'..'z' | 'A'..'Z' |'@'|'.'| ' ' 
> 	;
> 
> The input is:
> 
> This is a test /* with a comment */ in the middle
> This is a test /* with a comment */ in the middle
> This is a test /* with a comment */ in the middle
> 
> The result looks good, but some errors are print out:
> 
> test.txt(1) : lexer error 3 :
> 	 at offset 49, near char(0XA) :
> 	
> This is a test /* w
> test.txt(2) : lexer error 3 :
> 	 at offset 50, near char(0XA) :
> 	
> This is a test /* w
> test.txt(3) : lexer error 3 :
> 	 at offset 50, near char(0XA) :
> 	
> 
> Property: This is a test 
> Comment: /* with a comment */
> Property:  in the middle
> Property: This is a test 
> Comment: /* with a comment */
> Property:  in the middle
> Property: This is a test 
> Comment: /* with a comment */
> Property:  in the middle
> 
> BTW: The line ending in this file is 0x0A.
> 
> Could anyone explain this error and how to prevent it?
> 

Your Lexer has no rule that deals with your line ending character. need
to add a rule similar to:

NEWLINE : '\r' '\n'? | '\n' { $channel = HIDDEN; } ;

> Second question: How do I not include the '/*' and '*/' tags in the
> comment match?
> 

There appears to be no non-ambiguous way to remove the need for the '/*'
and '*/' in the syntax of your COMMENT rule.

However if you mean to remove those elements from the print out of the
comment text in the comment parser rule, then simply substring out the
stuff between the first two and last two characters in the `
$COMMENT.text->chars` inside the printf call (my C is very rusty, i
assume you know how to do that). 

Alternatively you should be able to make the COMMENT rule throw away the
bracketing characters by, inside an action, using $getText to obtain the
string of the comment (including the '/*' and '*/'), substring out the
characters that you want and then use $setText to deposit that substring
back into the COMMENT Token.

again, my C is rusty, and my knowledge of the C code generator is
sketchy at best --- so please consider the above advice to be more
conceptual in nature rather than being precise coding advice.

Hope this helps
   -jbb