[antlr-interest] Question with greedy
Jim Idle
jimi at temporal-wave.com
Wed Sep 23 15:29:32 PDT 2009
On 09/23/2009 02:35 PM, Andreas Volz wrote:
> Hello,
>
> I wrote this grammar:
>
> startrule
> : (property comment property)*
> ;
> comment
> : COMMENT { printf("Comment: \%s\n", $COMMENT.text->chars); }
> ;
>
> COMMENT
> : '/*' ( options {greedy=false;} : . )* '*/'
> ;
>
> property
> : TOKEN { printf("Property: \%s\n", $TOKEN.text->chars);}
>
> TOKEN
> : (ALPHA | DIGIT)+
>
> fragment DIGIT
> : '0'..'9'
> ;
>
> fragment ALPHA
> : 'a'..'z' | 'A'..'Z' |'@'|'.'| ' '
> ;
>
> The input is:
>
> This is a test /* with a comment */ in the middle
> This is a test /* with a comment */ in the middle
> This is a test /* with a comment */ in the middle
>
> The result looks good, but some errors are print out:
>
> test.txt(1) : lexer error 3 :
> at offset 49, near char(0XA) :
>
> This is a test /* w
> test.txt(2) : lexer error 3 :
> at offset 50, near char(0XA) :
>
> This is a test /* w
> test.txt(3) : lexer error 3 :
> at offset 50, near char(0XA) :
>
>
> Property: This is a test
> Comment: /* with a comment */
> Property: in the middle
> Property: This is a test
> Comment: /* with a comment */
> Property: in the middle
> Property: This is a test
> Comment: /* with a comment */
> Property: in the middle
>
> BTW: The line ending in this file is 0x0A.
>
> Could anyone explain this error and how to prevent it?
>
Well, you have not specified to the lexer what it should do with those
chars (I assume that this is C from your code above):
NL : ('\r' | '\n')+ { $channel=HIDDEN; } ;
ANY : . { SKIP(); } ; // Always make this the very last lexer rule
> Second question: How do I not include the '/*' and '*/' tags in the
> comment match?
>
Cheat (from the top of my head):
COMMENT : '/*' { $start = $pos; } ( options {greedy=false;} : .)* {
EMIT(); } '*/' ;
I think it is $pos, but you might need to use GETCHARPOSTIONINLINE()
rather than $pos.
Jim
More information about the antlr-interest
mailing list