[antlr-interest] lexical modes

Wed Jun 7 15:44:52 PDT 2006

Ter,

> What if you want the lexer though to return a stream of 
> tokens chosen  
> from a different set in between square brackets such as when  
> recognizing regular expressions.  Inside [...] you can refer to '('  
> as just a char not a grouping symbol.  Rather than creating and  
> switching to a new lexer every time you see a '[', perhaps good old  
> lexical modes from lex are the right idea.

Well, you would *have to* create a lexer on every switch, just reuse one
that's lying around for that very purpose. Still I lex-style modes. Low
[performance and implementation] cost.

> grammar regex;
> 
> expr : atom | range | ebnf | ... ;
> 
> range : LBRACK (CHAR | CHAR DASH CHAR)+ RBRACK ;
> 
> LBRACK : '[' {pushMode(inside_brackets);} ;
> 
> mode inside_brackets;
> 
> CHAR : ... ;
> DASH : '-' ;
> RBRACK : ']' {popMode();} ;
> 
> Something like that...make sense to add?  ANTLR can just switch-on- 
> mode when it enters nextToken() to jump to the appropriate set of  
> lexical rules.

+1

That's how I write lexers manually... ;-)

I presume that the lexer swicthing infrastructure still remains for neato
#include handling and such right?

Suggestion: all modes should be pre-declared in a declaration section as per
[F]lex

	grammar regex;
	....
	modes ..., ..., ...., .....;
	xmodes .....;

and, each mode definition should be in a block:

Mode inside_brackets [
	.....
	.....
];

Micheal

-----------------------
The best way to contact me is via the list/forum. My time is very limited.