[antlr-interest] Why do html comments ruin my grammar?

Ruth Karl ruth.karl at gmx.de
Sat Jun 30 00:03:25 PDT 2007


Hi, I wonder if this message has ever been read of if I shall send it 
again? Does anyone have an idea about this problem? I really need some 
help there....
Thanks.
Ruth

Ruth Karl schrieb:
> Hello out there, I need some help....
>
> i have been spending hours to find a way to exclude html comments from 
> further analysis with my jsp parser.
> But when I add the lexer rule
>
> HTMLCOMMENT    :    '<!--' ( options {greedy=false;} : . )* '-->' 
> {$channel=HIDDEN;}    ;
>
> to my grammar (see attachment), the interpreter in ANTLRworks will 
> start to see '<!'  (like in '<!DOCTYPE html ...') as part of a TEXT 
> item, even though TEXT is defined as
>
> TEXT          options {greedy=false;}
>          :    
> (~('<'|'>'|'%'|'/'|'"'|'\''|'('|')'|'['|']'|'{'|'}'|'\n'|'\t'|'\r'))+
>          ;
>
> which is confusing not only me but the parser as well... ;-)
>
>
> For the same reason, adding the HTMLCOMMENT lexer rule also causes 
> problems with the generated C# (!) code:
>
> a MismatchedTokenException will be thrown at mHTMLCOMMENT() method in 
> the lexer class when it comes to the line
>            Match("<!--");
>
> I thought I should somehow add a backtracking option and an exception 
> handling there, but I could not find out how... (backtracking option 
> does not seem to be allowed...???)
>
>
>
> I would really appreciate any kind of help, thanks a lot in advance!
> Ruth
>
>



More information about the antlr-interest mailing list