[antlr-interest] Gated semantic predicates without lookahead?

Fri Feb 20 14:13:41 PST 2009

At 07:34 18/02/2009, Andreas Meyer wrote:
 >Let's say I have 600 keywords that I want to recognize as
 >something like an identifier, but in other places, I really
 >want different tokens.
 >
 >Currently, I have 600 rules that do some table lookup, so
 >that they consume an identifier and return one of the 600
 >possible artificial keyword tokens. But still, this makes
 >the stream of tokens look like an endless stream of
 >identifiers to the parser, and I have a vague feeling
 >that this might be not the best possible solution.

You're right, that's going to make the parser slower.

That's one of the reasons why I prefer the other approach to solve 
the "keywords as identifiers" problem -- recognise the keywords as 
individual tokens in the lexer, and then have a ridiculously long 
"identifier" rule in the parser that accepts the catch-all ID 
token in addition to all the tokens for the individual 
keywords.  Since that's an integer match against a tokenset, it 
ought to be significantly faster than repeated string 
comparisons.  (Having said that, I haven't actually *measured* 
this.)

(It is slightly more work to add a new keyword, though, since you 
have to put it in two places.)