[antlr-interest] lexical modes
Sohail Somani
sohail at taggedtype.net
Wed Jun 7 20:01:46 PDT 2006
On Wed, 2006-06-07 at 14:01 -0700, Terence Parr wrote:
> Hi, consider matching strings in the lexer. It's pretty easy in
> ANTLR as you can make rule references:
>
> STRING : '"' (ESC | .)* '"' ;
> ESC : ... ;
>
> What if you want the lexer though to return a stream of tokens chosen
> from a different set in between square brackets such as when
> recognizing regular expressions. Inside [...] you can refer to '('
> as just a char not a grouping symbol. Rather than creating and
> switching to a new lexer every time you see a '[', perhaps good old
> lexical modes from lex are the right idea.
>
> grammar regex;
>
> expr : atom | range | ebnf | ... ;
>
> range : LBRACK (CHAR | CHAR DASH CHAR)+ RBRACK ;
>
> LBRACK : '[' {pushMode(inside_brackets);} ;
>
> mode inside_brackets;
>
> CHAR : ... ;
> DASH : '-' ;
> RBRACK : ']' {popMode();} ;
You could qualify the mode somehow or group it:
LBRACK :....
mode inside_brackets
{
CHAR : ...
DASH : ...
}
OTHER:...
This would allow one to keep tokens that apply to a mode close to the
rules that use it.
More information about the antlr-interest
mailing list