[antlr-interest] Re: lexer "modes" for XML parsing etc...

Sat Nov 19 13:19:08 PST 2005

Hi Terence,

I suppose something like this will work as a quick hack, but you do
not consider this a permanent fix, right. Because it's ugly...

What you describe for v3 final really sounds like lexer modes which -
as far as I remember - you don't like?! Finally giving in? Hihihihi ;)

Oliver

2005/11/19, Terence Parr <parrt at cs.usfca.edu>:
> Hi Oliver,
>
> I'm in a situation where, for v3, I need an island grammar for the
> stuff to the right of the rewrite "->" symbol to handle string
> templates.  So, I have to think about modes or switching streams or
> something to deal with different contexts within the same stream (I'm
> ignoring include file type switching for this email).
>
> Your problem was that you wanted to gate certain rules in/out for XML
> parsing (inside/outside of a tag), right?  Can you simply define
> rules as usual for outside of a tag and then another rule that says
> which rules can be in the tag:
>
> class L extends Lexer;
> {protected boolean insideTag=false;}
> // normal stuff (is put into mTokens() method)
> PCDATA : ... ;
> CDATA : ... ;
> COMMENT : ... ;
> OPEN : '<' {insideTag=true;} ;
>
> // inside tag tag
> ID : ... ;
> EQ : '=' ;
> STRING : ... ;
> CLOSE : '>' {insideTag=false;} ;
> TAG_TOKENS : ID | EQ | STRING | CLOSE | ... ;
> ...
>
> Then in a subclass of L, do this:
>
> class SL extends L {
>    public Token mTokens() {
>      if ( insideTag ) {
>        return mTAG_TOKENS();
>      }
>      else {
>        return super.mTokens();
>      }
>    }
> }
>
> Will something like that work?  It avoids the predicates in the lexer
> and we manually add them in code so we can really make it a gate.
>
> In v3, perhaps we can formalize this situation (single input stream,
> multiple contexts) by allowing you set the start rule for the lexer.
> The default is Tokens but you can define another tokens rule and then
> have an action set the next start rule (for when the lexer is asked
> to emit a token again).
>
> Ter
>