[antlr-interest] tokenVocab option leads to incomplete DFA in lexer

Fri Jan 25 05:29:32 PST 2008

Finally, I have found a way to fool the grammar generator, although this page
http://www.antlr.org/wiki/display/ANTLR3/Migrating+from+ANTLR+2+to+ANTLR+3
states that it is not be possible to assign token types to certain literals
("Apparently, 'testLiterals' on tokens is no longer allowed (it is now
unnecessary)." )
and comments in the org.antlr.tool.AssignTokenTypesWalker.java claim that
  // if lexer, don't allow aliasing in tokens section

If you create Basic.tokens file that contains these lines

  DEFINE=101
  DECLARE=102

and then, in E.g (in the lexer grammar) you create a special lexer
rule for selected literals like this

  DECLARE: 'declare';
  DEFINE: 'define';

everything will work :)

The whole grammar:

grammar E;

options {
  tokenVocab = Basic;
  output = AST;
  ASTLabelType = CommonTree;
}

program  : ( statement )+   ;

statement
  : DEFINE ID '=' INT ';'
  | DECLARE ID ';'
  ;

DECLARE : 'declare';
DEFINE : 'define';

ID   : ('a'..'z'|'A'..'Z')+ ;
INT  : '0'..'9'+  ;
WS  : ( '\n' | '\r' | ' ' | '\t' )+    { $channel = HIDDEN; }  ;

The only drawback is that you have to use DECLARE instead of 'declare'
in the parser grammar
(but this may also be perceived as an advantage because if you
misspell DECLARE antlr.Tool will detect it).

The generated E.tokens file contains correct values

  DEFINE=101
  INT=104
  WS=105
  DECLARE=102
  ID=103
  '='=106
  ';'=107

-- 
Erik Kratochvíl