[antlr-interest] lexical modes

Steve Murphy murf at parsetree.com
Thu Jun 8 07:26:12 PDT 2006


On Wed, 7 Jun 2006 14:01:39 -0700  (15:01 MDT), Terence Parr
<parrt at cs.usfca.edu>antlr-interest-request at antlr.org wrote:
> Hi, consider matching strings in the lexer.  It's pretty easy in  
> ANTLR as you can make rule references:
> 
> STRING : '"' (ESC | .)* '"' ;
> ESC : ... ;
> 
> What if you want the lexer though to return a stream of tokens
> chosen  
> from a different set in between square brackets such as when  
> recognizing regular expressions.  Inside [...] you can refer to '('  
> as just a char not a grouping symbol.  Rather than creating and  
> switching to a new lexer every time you see a '[', perhaps good old  
> lexical modes from lex are the right idea.
> 
> grammar regex;
> 
> expr : atom | range | ebnf | ... ;
> 
> range : LBRACK (CHAR | CHAR DASH CHAR)+ RBRACK ;
> 
> LBRACK : '[' {pushMode(inside_brackets);} ;
> 
> mode inside_brackets;
> 
> CHAR : ... ;
> DASH : '-' ;
> RBRACK : ']' {popMode();} ;
> 
> Something like that...make sense to add?  ANTLR can just switch-on- 
> mode when it enters nextToken() to jump to the appropriate set of  
> lexical rules.
> 
> Ter
> 
> 
> 
> 

This sounds great! What do you do about look-ahead tokens? Any concerns
there?

murf




More information about the antlr-interest mailing list