[antlr-interest] lexer "modes" for XML parsing etc...
Terence Parr
parrt at cs.usfca.edu
Sat Nov 19 13:01:48 PST 2005
Hi Oliver,
I'm in a situation where, for v3, I need an island grammar for the
stuff to the right of the rewrite "->" symbol to handle string
templates. So, I have to think about modes or switching streams or
something to deal with different contexts within the same stream (I'm
ignoring include file type switching for this email).
Your problem was that you wanted to gate certain rules in/out for XML
parsing (inside/outside of a tag), right? Can you simply define
rules as usual for outside of a tag and then another rule that says
which rules can be in the tag:
class L extends Lexer;
{protected boolean insideTag=false;}
// normal stuff (is put into mTokens() method)
PCDATA : ... ;
CDATA : ... ;
COMMENT : ... ;
OPEN : '<' {insideTag=true;} ;
// inside tag tag
ID : ... ;
EQ : '=' ;
STRING : ... ;
CLOSE : '>' {insideTag=false;} ;
TAG_TOKENS : ID | EQ | STRING | CLOSE | ... ;
...
Then in a subclass of L, do this:
class SL extends L {
public Token mTokens() {
if ( insideTag ) {
return mTAG_TOKENS();
}
else {
return super.mTokens();
}
}
}
Will something like that work? It avoids the predicates in the lexer
and we manually add them in code so we can really make it a gate.
In v3, perhaps we can formalize this situation (single input stream,
multiple contexts) by allowing you set the start rule for the lexer.
The default is Tokens but you can define another tokens rule and then
have an action set the next start rule (for when the lexer is asked
to emit a token again).
Ter
More information about the antlr-interest
mailing list