[antlr-interest] lexer "modes" for XML parsing etc...

Terence Parr parrt at cs.usfca.edu
Sat Nov 19 13:01:48 PST 2005


Hi Oliver,

I'm in a situation where, for v3, I need an island grammar for the  
stuff to the right of the rewrite "->" symbol to handle string  
templates.  So, I have to think about modes or switching streams or  
something to deal with different contexts within the same stream (I'm  
ignoring include file type switching for this email).

Your problem was that you wanted to gate certain rules in/out for XML  
parsing (inside/outside of a tag), right?  Can you simply define  
rules as usual for outside of a tag and then another rule that says  
which rules can be in the tag:

class L extends Lexer;
{protected boolean insideTag=false;}
// normal stuff (is put into mTokens() method)
PCDATA : ... ;
CDATA : ... ;
COMMENT : ... ;
OPEN : '<' {insideTag=true;} ;

// inside tag tag
ID : ... ;
EQ : '=' ;
STRING : ... ;
CLOSE : '>' {insideTag=false;} ;
TAG_TOKENS : ID | EQ | STRING | CLOSE | ... ;
...

Then in a subclass of L, do this:

class SL extends L {
   public Token mTokens() {
     if ( insideTag ) {
       return mTAG_TOKENS();
     }
     else {
       return super.mTokens();
     }
   }
}

Will something like that work?  It avoids the predicates in the lexer  
and we manually add them in code so we can really make it a gate.

In v3, perhaps we can formalize this situation (single input stream,  
multiple contexts) by allowing you set the start rule for the lexer.   
The default is Tokens but you can define another tokens rule and then  
have an action set the next start rule (for when the lexer is asked  
to emit a token again).

Ter


More information about the antlr-interest mailing list