[antlr-interest] Why do html comments ruin my grammar?

Gavin Lambert antlr at mirality.co.nz
Sat Jun 30 03:16:41 PDT 2007


At 19:03 30/06/2007, Ruth Karl wrote:
 >Hi, I wonder if this message has ever been read of if I shall
 >send it again? Does anyone have an idea about this problem? I
 >really need some help there....
[...]
 >> But when I add the lexer rule
 >>
 >> HTMLCOMMENT    :    '<!--' ( options {greedy=false;} : . )*
 >> '-->' {$channel=HIDDEN;}    ;
 >>
 >> to my grammar (see attachment), the interpreter in ANTLRworks
 >> will start to see '<!'  (like in '<!DOCTYPE html ...') as part 

 >> of a TEXT item, even though TEXT is defined as
 >>
 >> TEXT          options {greedy=false;}
 >>          :
 >>(~('<'|'>'|'%'|'/'|'"'|'\''|'('|')'|'['|']'|'{'|'}'|'\n'|'\t'|'\r')
 >>)+
 >>          ;
 >>
 >> which is confusing not only me but the parser as well... ;-)

Try removing the greedy option from the TEXT rule.  I don't think 
it will actually work there, since that's a top-level lexer rule 
and you don't have any following characters within the rule 
itself.  (Though I could be wrong.)

But anyway, with those two rules you've posted, the ! will match 
TEXT, assuming the < has already matched some other token.



More information about the antlr-interest mailing list