[antlr-interest] Catch all rule and warnings

Wed Dec 13 05:22:36 PST 2006

On Wed, Dec 13, 2006 at 09:36:21AM +0000, Andrei Vereha wrote:
> I have a question about warnings.
> 
> I took the "HTML Indexer" from Gramar list section. When I generate
> the source with antlr, I get 3 warnings. Is a way to avoid these
> warnings?
> 
> I have have a similar problem to solve:  I need to look for a lot of
> "custom" tags in a html file + the html content(I can't ignore the
> HTML content !). If I use this aproach, I will get  more and more
> warnings. This is the only solution?
> 
> In a simple case, (a JSP file), where I need to recognize just the
> "<%" and "%>" tag, TEXT_BETWEEN_TAGS and TEXT(text outside tags), I
> made a gramar without warnings, by writing a catch all rule like this:
> 
> JSPSTART:"<%" {this.in_jsp=true };
> JSPEND:"%>" {this.in_jsp=false };
> 
> TEXT options{testLiterals=true} : (~( '<' |  '>' | '%'))+
> {
> if(this.in_jsp) $setType(TEXT_BETWEEN_TAGS);
> };
> 
> In a more complex case, where I need to regonize : <TAG1>, <TAG2>, the
> TEXT rule will be imposible to write.

Do you really need to recognise these in the lexer?  I mean, does the
parse need to proceed differently depending on the name of the tag?

If you just need to run different actions based on the tag name, then I
would suggest having a generic TAG_NAME token and testing its contents
in the action (indeed, this might even allow you to add support for new
tag names at runtime without needing to change the grammar).

Also, note that simple approach your looking at will not handle
complex input like,

  <%  out.print("%>");  %>

But hey, maybe that's fine :)

For an example of another XML-like language (which also currently fails to
handle complex input equivalent to the example above), see here,

  http://svn.badgers-in-foil.co.uk/metaas/trunk/src/main/antlr/uk/co/badgersinfoil/metaas/impl/parser/e4x/E4X.g

NB  This is an ANTLR 3 grammar.

ta,
dave

-- 
http://david.holroyd.me.uk/