[antlr-interest] lexical modes

Terence Parr parrt at cs.usfca.edu
Wed Jun 7 14:01:39 PDT 2006


Hi, consider matching strings in the lexer.  It's pretty easy in  
ANTLR as you can make rule references:

STRING : '"' (ESC | .)* '"' ;
ESC : ... ;

What if you want the lexer though to return a stream of tokens chosen  
from a different set in between square brackets such as when  
recognizing regular expressions.  Inside [...] you can refer to '('  
as just a char not a grouping symbol.  Rather than creating and  
switching to a new lexer every time you see a '[', perhaps good old  
lexical modes from lex are the right idea.

grammar regex;

expr : atom | range | ebnf | ... ;

range : LBRACK (CHAR | CHAR DASH CHAR)+ RBRACK ;

LBRACK : '[' {pushMode(inside_brackets);} ;

mode inside_brackets;

CHAR : ... ;
DASH : '-' ;
RBRACK : ']' {popMode();} ;

Something like that...make sense to add?  ANTLR can just switch-on- 
mode when it enters nextToken() to jump to the appropriate set of  
lexical rules.

Ter



More information about the antlr-interest mailing list