[antlr-interest] Why do html comments ruin my grammar?

Ruth Karl ruth.karl at gmx.de
Sat Jun 30 04:23:52 PDT 2007


Gavin Lambert schrieb:
> At 19:03 30/06/2007, Ruth Karl wrote:
> >Hi, I wonder if this message has ever been read of if I shall
> >send it again? Does anyone have an idea about this problem? I
> >really need some help there....
> [...]
> >> But when I add the lexer rule
> >>
> >> HTMLCOMMENT    :    '<!--' ( options {greedy=false;} : . )*
> >> '-->' {$channel=HIDDEN;}    ;
> >>
> >> to my grammar (see attachment), the interpreter in ANTLRworks
> >> will start to see '<!'  (like in '<!DOCTYPE html ...') as part
> >> of a TEXT item, even though TEXT is defined as
> >>
> >> TEXT          options {greedy=false;}
> >>          :
> >>(~('<'|'>'|'%'|'/'|'"'|'\''|'('|')'|'['|']'|'{'|'}'|'\n'|'\t'|'\r')
> >>)+
> >>          ;
> >>
> >> which is confusing not only me but the parser as well... ;-)
>
> Try removing the greedy option from the TEXT rule.  I don't think it 
> will actually work there, since that's a top-level lexer rule and you 
> don't have any following characters within the rule itself.  (Though I 
> could be wrong.)
>
> But anyway, with those two rules you've posted, the ! will match TEXT, 
> assuming the < has already matched some other token.
>
Hi Gavin,

thanks a lot for your help. Leaving the greedy option out did not help - 
but I found a solution by myself now: (and it is so simple!): I just 
added another lexer rule:

DOCTYPE    :    '<!DOCTYPE' ( options {greedy=false;} : . )* '>'    ;

Thanks anyway, and have a nice day,
Ruth


More information about the antlr-interest mailing list