[antlr-interest] Why do html comments ruin my grammar? ;-)

Ruth Karl ruth.karl at gmx.de
Thu Jun 28 07:55:43 PDT 2007


Hello out there, I need some help....

i have been spending hours to find a way to exclude html comments from 
further analysis with my jsp parser.
But when I add the lexer rule

HTMLCOMMENT    :    '<!--' ( options {greedy=false;} : . )* '-->' 
{$channel=HIDDEN;}    ;

to my grammar (see attachment), the interpreter in ANTLRworks will start 
to see '<!'  (like in '<!DOCTYPE html ...') as part of a TEXT item, even 
though TEXT is defined as

TEXT          options {greedy=false;}
          :    
(~('<'|'>'|'%'|'/'|'"'|'\''|'('|')'|'['|']'|'{'|'}'|'\n'|'\t'|'\r'))+
          ;

which is confusing not only me but the parser as well... ;-)


For the same reason, adding the HTMLCOMMENT lexer rule also causes 
problems with the generated C# (!) code:

a MismatchedTokenException will be thrown at mHTMLCOMMENT() method in 
the lexer class when it comes to the line
            Match("<!--");

I thought I should somehow add a backtracking option and an exception 
handling there, but I could not find out how... (backtracking option 
does not seem to be allowed...???)



I would really appreciate any kind of help, thanks a lot in advance!
Ruth


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: JSP.g
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20070628/c29eb673/attachment-0001.pl 


More information about the antlr-interest mailing list