[antlr-interest] lexical modes
Steve Murphy
murf at parsetree.com
Thu Jun 8 07:26:12 PDT 2006
On Wed, 7 Jun 2006 14:01:39 -0700 (15:01 MDT), Terence Parr
<parrt at cs.usfca.edu>antlr-interest-request at antlr.org wrote:
> Hi, consider matching strings in the lexer. It's pretty easy in
> ANTLR as you can make rule references:
>
> STRING : '"' (ESC | .)* '"' ;
> ESC : ... ;
>
> What if you want the lexer though to return a stream of tokens
> chosen
> from a different set in between square brackets such as when
> recognizing regular expressions. Inside [...] you can refer to '('
> as just a char not a grouping symbol. Rather than creating and
> switching to a new lexer every time you see a '[', perhaps good old
> lexical modes from lex are the right idea.
>
> grammar regex;
>
> expr : atom | range | ebnf | ... ;
>
> range : LBRACK (CHAR | CHAR DASH CHAR)+ RBRACK ;
>
> LBRACK : '[' {pushMode(inside_brackets);} ;
>
> mode inside_brackets;
>
> CHAR : ... ;
> DASH : '-' ;
> RBRACK : ']' {popMode();} ;
>
> Something like that...make sense to add? ANTLR can just switch-on-
> mode when it enters nextToken() to jump to the appropriate set of
> lexical rules.
>
> Ter
>
>
>
>
This sounds great! What do you do about look-ahead tokens? Any concerns
there?
murf
More information about the antlr-interest
mailing list