[antlr-interest] lexing multiple literals to one token

Sat Jul 30 06:19:52 PDT 2005

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org 
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Robert Anderson
> Sent: Wednesday, July 27, 2005 9:54 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] lexing multiple literals to one token
> 
> I can't quite figure out the syntax for this:
> 
> I want to lex two different (interchangeable) keywords into 
> the same token.  I want to use the tokens {..} mechanism 
> because I want both of these to be considered by a 
> testLiterals=true identifier rule option. 
> How do I do this?  The following don't seem to work:
> 
> tokens {
>    MYTOK="form1";
>    MYTOK="form2";
> }

Easy, just override testLiteralsTable() in the parser.  At least, it's easy
if you're using C++, I don't know if you can do it in Java. Put this at the
top of your lexer.g, after the tokens block, and it will be copied into the
generated lexer code:

{
   int testLiteralsTable( int ttype ) const
   {
      if ( _tcsicmp( text.c_str(), _T( "form2" ) == 0 )
      {
          ttype = MYTOK;
      }
      else
      {
         ttype = __super:testLiteralsTable( ttype );
      }

      return type;
   }
}

If you need a case sensitive comparison, use _tcscmp() instead, or just use
the '==' operator on the std::string class.  'text' is a class var in the
lexer that contains the text of the current token being processed.

Don