[antlr-interest] lexical modes

Wed Jun 7 20:01:46 PDT 2006

On Wed, 2006-06-07 at 14:01 -0700, Terence Parr wrote:
> Hi, consider matching strings in the lexer.  It's pretty easy in  
> ANTLR as you can make rule references:
> 
> STRING : '"' (ESC | .)* '"' ;
> ESC : ... ;
> 
> What if you want the lexer though to return a stream of tokens chosen  
> from a different set in between square brackets such as when  
> recognizing regular expressions.  Inside [...] you can refer to '('  
> as just a char not a grouping symbol.  Rather than creating and  
> switching to a new lexer every time you see a '[', perhaps good old  
> lexical modes from lex are the right idea.
> 
> grammar regex;
> 
> expr : atom | range | ebnf | ... ;
> 
> range : LBRACK (CHAR | CHAR DASH CHAR)+ RBRACK ;
> 
> LBRACK : '[' {pushMode(inside_brackets);} ;
> 
> mode inside_brackets;
> 
> CHAR : ... ;
> DASH : '-' ;
> RBRACK : ']' {popMode();} ;

You could qualify the mode somehow or group it:

LBRACK :....

mode inside_brackets
{
CHAR : ...
DASH : ...
}

OTHER:...

This would allow one to keep tokens that apply to a mode close to the
rules that use it.