[antlr-interest] lexical modes
Micheal J
open.zone at virgin.net
Wed Jun 7 15:44:52 PDT 2006
Ter,
> What if you want the lexer though to return a stream of
> tokens chosen
> from a different set in between square brackets such as when
> recognizing regular expressions. Inside [...] you can refer to '('
> as just a char not a grouping symbol. Rather than creating and
> switching to a new lexer every time you see a '[', perhaps good old
> lexical modes from lex are the right idea.
Well, you would *have to* create a lexer on every switch, just reuse one
that's lying around for that very purpose. Still I lex-style modes. Low
[performance and implementation] cost.
> grammar regex;
>
> expr : atom | range | ebnf | ... ;
>
> range : LBRACK (CHAR | CHAR DASH CHAR)+ RBRACK ;
>
> LBRACK : '[' {pushMode(inside_brackets);} ;
>
> mode inside_brackets;
>
> CHAR : ... ;
> DASH : '-' ;
> RBRACK : ']' {popMode();} ;
>
> Something like that...make sense to add? ANTLR can just switch-on-
> mode when it enters nextToken() to jump to the appropriate set of
> lexical rules.
+1
That's how I write lexers manually... ;-)
I presume that the lexer swicthing infrastructure still remains for neato
#include handling and such right?
Suggestion: all modes should be pre-declared in a declaration section as per
[F]lex
grammar regex;
....
modes ..., ..., ...., .....;
xmodes .....;
and, each mode definition should be in a block:
Mode inside_brackets [
.....
.....
];
Micheal
-----------------------
The best way to contact me is via the list/forum. My time is very limited.
More information about the antlr-interest
mailing list