[antlr-interest] Controlling Lexer from Parser

Thu Dec 3 02:59:19 PST 2009

At 23:07 3/12/2009, Gokulakannan Somasundaram wrote:
>I am trying to parse a SQL grammar, in which the SQL Keywords are 
>sometime allowed as table names / column names.
>a)  Say when i am expecting a table_name /column name from 
>parser, i set a global variable called x.
>b) i check this x to set the token type of that particular token.
>
>This will succeed only if the parser completes executing the 
>parsing actions before trying to make tokens out of the 
>inputstream. Is it always the case with ANTLR? I see no reason 
>why this should not work, but i want to make sure. (The Lexer 
>and  Parser are in different grammar files)

No.  In fact it is never the case with ANTLR -- the lexer runs to 
completion and generates the entire token stream before any parser 
rules are executed.

There are two common strategies for doing what you're trying to do 
(both documented in the wiki):
   1. Lex any letter sequence as an ID, then use semantic 
predicates to treat specific IDs as keywords in the parser if they 
have the right contents.
   2. Lex keywords as individual keyword tokens (eg. SELECT) and 
anything else as an ID, then define a parser rule "id" that 
accepts ID or any of the keywords, and use that in any context you 
want an identifier.

(The second is my preferred method, but either one will work.)

In either case, you can either leave the token type alone (eg. if 
directly executing actions in the parser) or change the type as 
needed (eg. if building an AST for later processing).