[antlr-interest] Memory issue while lexer initialization

Mon Dec 31 02:20:04 PST 2007

At 23:08 31/12/2007, elango m wrote:
>I wanted to define keywords(around 50) in my lexer, something 
>similar to the following. When I remove this from the grammar I 
>don't see the memory issue. Is there any way to solve this 
>problem?
>
>KEYWORD : 'Keyword1' | 'Keyword2' | .... | 'Keyword50';

I'm not sure why the above would have the effect you were 
describing (maybe the state transitions just blow out 
exponentially), but usually the way you'd do this sort of thing is 
like this:

tokens {
   KEYWORD1 = 'Keyword1';
   KEYWORD2 = 'Keyword2';
   ...
   KEYWORD50 = 'Keyword50';
}

...

keyword: KEYWORD1 | KEYWORD2 | .... | KEYWORD50;

This way each keyword gets its own unique token, so you can define 
which keywords are accepted in certain places specifically, or 
just use the catchall 'keyword' rule when you don't care.

Although note that it's not quite this simple in practice; 
sometimes the lexer needs a little help disambiguating certain 
keywords, especially if they have a common prefix.