[antlr-interest] QUESTION on: How do I handle abbreviated keywords?

Gavin Lambert antlr at mirality.co.nz
Fri Oct 31 19:52:34 PDT 2008


At 14:00 1/11/2008, Ben Gillis wrote:
>see 
>http://www.antlr.org/wiki/pages/viewpage.action?pageId=1802308.
>
>It's not clear to me the connection between the tokens block (and 
>its auto-gen'd code), and this statement in the above URL:
>
>"might simply consult an IDictionary<string,int> map of all 
>keywords (incl abbreviations). "
>
>The tokens block ends up in a string array named tokenNames 
>(CSharp2 target).  My tokens keywords are mixed with other 
>entries related to the grammar definition.
>
>Am I supposed to write an initialization routine that builds a 
>dictionary?  If so, I have to filter through the auto-gen'd 
>tokenNames making sure to enter only my keywords, otherwise I'll 
>get false hits in my CheckKeywordsTable routine.  Somehow, this 
>doesn't seem quite right, ???

The tokenNames array is a list of token *names*, which is useless 
for that purpose, since for that particular keyword matching 
strategy what you're after is a mapping of keyword *text* to token 
*value*, which is an entirely different thing.

Say you have the keywords "begin", "end", and "while".  Your 
tokens block declares imaginary token types like this:

tokens {
   BEGIN;
   END;
   WHILE;
}

These carry no text and can't do any matching by themselves, but 
they *do* allocate a token ID for them.  In your lexer's 
constructor, you additionally set up a dictionary mapping like so:

   keywordTable.Add("begin", BEGIN);
   keywordTable.Add("end", END);
   keywordTable.Add("while", WHILE);

Then in the CheckKeywordsTable function you use that mapping to 
return the "real" token type, be that one listed in the table or 
the catch-all IDENTIFIER (when it doesn't look like a 
keyword).  For more complicated cases you may need to do something 
smarter than a dictionary lookup, but that's up to you.

(To handle abbreviations, which is what that page is primarily 
focused on, then obviously you'll have to add the valid 
abbreviations of the keywords to the table as well.)



More information about the antlr-interest mailing list