[antlr-interest] ANTLR enforces LL(1) beyond about 300 tokens

A Z asicaddress at gmail.com
Sun Aug 22 12:38:05 PDT 2010


On Sun, Aug 22, 2010 at 11:54 AM, Kenneth Domino <
kenneth.domino at domemtech.com> wrote:

> allows a clean compile but the ANTLR book indicates ANTLR should try to
>> match in the order listed.
>>
>
> Hard to say without looking at the full grammar file, but you could try to
> separate
> the lexer and parser grammars into two files, then use "option
> {filter=true;}" in
> the lexer grammar.  (The option seems to only work if you make two
> grammars.)
> That generates a lexer that uses a rule order priority, which is what you
> mention.
> By default, Antlr does not generate this kind of lexer since the
> filter-type lexer
> has its problems (See The Definitive ANTLR Reference).  Also, when
> separating
> the parser and lexer grammars, it seems Antlr has a problem keeping the two
> token lists
> in synch.  You can fix this with a hack: declare all tokens, with the same
> exact order,
> in the two grammars.  Not sure why,
>
> Ken
>
>
>
>
Thanks for the response.

I modified the grammar to remove all the parser rules(except for one so
ANTLR doesn't complain). At the bottom is the grammar which compiles with no
warnings or errors.

If I comment out either K_STATIC or K_STRUCT, the compile fails with 40
warnings and 1 error:

________________________

warning(209): temp2.g:280:1: Multiple token rules can match input such as
"'&'": AMPERSAND, AMPSTAR, AMPTWO, AMPTHREE

As a result, token(s) AMPSTAR,AMPTWO,AMPTHREE were disabled for that input
warning(209): temp2.g:283:1: Multiple token rules can match input such as
"'*'": ASTERISK, ASTWO, FULLCON, ASCOLCOLAS, ASRPARAN

As a result, token(s) ASTWO,FULLCON,ASCOLCOLAS,ASRPARAN were disabled for
that input
...
error(208): temp2.g:299:1: The following token definitions can never be
matched because prior tokens match the same input: <Then it lists all of the
tokens>
________________________


I now think this may be specific to the binary I am using
(ANTLR Parser Generator  Version 3.2 Fedora release 12 (Constantine) Thu Mar
11 20:28:57 UTC 2010).

 I do not see the issue if I use ANTLRWorks. I'll look into this more.





grammar temp2;


tokens
{
K_ACCEPT_ON                = 'accept_on';
K_ALIAS                    = 'alias';
K_ALWAYS                   = 'always';
K_ALWAYS_COMB              = 'always_comb';
K_ALWAYS_FF                = 'always_ff';
K_ALWAYS_LATCH             = 'always_latch';
K_AND                      = 'and';
K_ASSERT                   = 'assert';
K_ASSIGN                   = 'assign';
K_ASSUME                   = 'assume';
K_AUTOMATIC                = 'automatic';
K_BEFORE                   = 'before';
K_BEGIN                    = 'begin';
K_BIND                     = 'bind';
K_BINS                     = 'bins';
K_BINSOF                   = 'binsof';
K_BIT                      = 'bit';
K_BREAK                    = 'break';
K_BUF                      = 'buf';
K_BUFIF0                   = 'bufif0';
K_BUFIF1                   = 'bufif1';
K_BYTE                     = 'byte';
K_CASE                     = 'case';
K_CASEX                    = 'casex';
K_CASEZ                    = 'casez';
K_CELL                     = 'cell';
K_CHANDLE                  = 'chandle';
K_CHECKER                  = 'checker';
K_CLASS                    = 'class';
K_CLOCKING                 = 'clocking';
K_CMOS                     = 'cmos';
K_CONFIG                   = 'config';
K_CONST                    = 'const';
K_CONSTRAINT               = 'constraint';
K_CONTEXT                  = 'context';
K_CONTINUE                 = 'continue';
K_COVER                    = 'cover';
K_COVERGROUP               = 'covergroup';
K_COVERPOINT               = 'coverpoint';
K_CROSS                    = 'cross';
K_DEASSIGN                 = 'deassign';
K_DEFAULT                  = 'default';
K_DEFPARAM                 = 'defparam';
K_DESIGN                   = 'design';
K_DISABLE                  = 'disable';
K_DIST                     = 'dist';
K_DO                       = 'do';
K_EDGE                     = 'edge';
K_ELSE                     = 'else';
K_END                      = 'end';
K_ENDCASE                  = 'endcase';
K_ENDCHECKER               = 'endchecker';
K_ENDCLASS                 = 'endclass';
K_ENDCLOCKING              = 'endclocking';
K_ENDCONFIG                = 'endconfig';
K_ENDFUNCTION              = 'endfunction';
K_ENDGENERATE              = 'endgenerate';
K_ENDGROUP                 = 'endgroup';
K_ENDINTERFACE             = 'endinterface';
K_ENDMODULE                = 'endmodule';
K_ENDPACKAGE               = 'endpackage';
K_ENDPRIMITIVE             = 'endprimitive';
K_ENDPROGRAM               = 'endprogram';
K_ENDPROPERTY              = 'endproperty';
K_ENDSPECIFY               = 'endspecify';
K_ENDSEQUENCE              = 'endsequence';
K_ENDTABLE                 = 'endtable';
K_ENDTASK                  = 'endtask';
K_ENUM                     = 'enum';
K_EVENT                    = 'event';
K_EVENTUALLY               = 'eventually';
K_EXPECT                   = 'expect';
K_EXPORT                   = 'export';
K_EXTENDS                  = 'extends';
K_EXTERN                   = 'extern';
K_FINAL                    = 'final';
K_FIRST_MATCH              = 'first_match';
K_FOR                      = 'for';
K_FORCE                    = 'force';
K_FOREACH                  = 'foreach';
K_FOREVER                  = 'forever';
K_FORK                     = 'fork';
K_FORKJOIN                 = 'forkjoin';
K_FUNCTION                 = 'function';
K_GENERATE                 = 'generate';
K_GENVAR                   = 'genvar';
K_GLOBAL                   = 'global';
K_HIGHZ0                   = 'highz0';
K_HIGHZ1                   = 'highz1';
K_IF                       = 'if';
K_IFF                      = 'iff';
K_IFNONE                   = 'ifnone';
K_IGNORE_BINS              = 'ignore_bins';
K_ILLEGAL_BINS             = 'illegal_bins';
K_IMPLIES                  = 'implies';
K_IMPORT                   = 'import';
K_INCDIR                   = 'incdir';
K_INCLUDE                  = 'include';
K_INITIAL                  = 'initial';
K_INOUT                    = 'inout';
K_INPUT                    = 'input';
K_INSIDE                   = 'inside';
K_INSTANCE                 = 'instance';
K_INT                      = 'int';
K_INTEGER                  = 'integer';
K_INTERFACE                = 'interface';
K_INTERSECT                = 'intersect';
K_JOIN                     = 'join';
K_JOIN_ANY                 = 'join_any';
K_JOIN_NONE                = 'join_none';
K_LARGE                    = 'large';
K_LET                      = 'let';
K_LIBLIST                  = 'liblist';
K_LIBRARY                  = 'library';
K_LOCAL                    = 'local';
K_LOCALPARAM               = 'localparam';
K_LOGIC                    = 'logic';
K_LONGINT                  = 'longint';
K_MACROMODULE              = 'macromodule';
K_MATCHES                  = 'matches';
K_MEDIUM                   = 'medium';
K_MODPORT                  = 'modport';
K_MODULE                   = 'module';
K_NAND                     = 'nand';
K_NEGEDGE                  = 'negedge';
K_NEW                      = 'new';
K_NEXTTIME                 = 'nexttime';
K_NMOS                     = 'nmos';
K_NOR                      = 'nor';
K_NOSHOWCANCELLED          = 'noshowcancelled';
K_NOT                      = 'not';
K_NOTIF0                   = 'notif0';
K_NOTIF1                   = 'notif1';
K_NULL                     = 'null';
K_OR                       = 'or';
K_OUTPUT                   = 'output';
K_PACKAGE                  = 'package';
K_PACKED                   = 'packed';
K_PARAMETER                = 'parameter';
K_PMOS                     = 'pmos';
K_POSEDGE                  = 'posedge';
K_PRIMITIVE                = 'primitive';
K_PRIORITY                 = 'priority';
K_PROGRAM                  = 'program';
K_PROPERTY                 = 'property';
K_PROTECTED                = 'protected';
K_PULL0                    = 'pull0';
K_PULL1                    = 'pull1';
K_PULLDOWN                 = 'pulldown';
K_PULLUP                   = 'pullup';
K_PULSESTYLE_ONDETECT      = 'pulsestyle_ondetect';
K_PULSESTYLE_ONEVENT       = 'pulsestyle_onevent';
K_PURE                     = 'pure';
K_RAND                     = 'rand';
K_RANDC                    = 'randc';
K_RANDCASE                 = 'randcase';
K_RANDSEQUENCE             = 'randsequence';
K_RCMOS                    = 'rcmos';
K_REAL                     = 'real';
K_REALTIME                 = 'realtime';
K_REF                      = 'ref';
K_REG                      = 'reg';
K_REJECT_ON                = 'reject_on';
K_RELEASE                  = 'release';
K_REPEAT                   = 'repeat';
K_RESTRICT                 = 'restrict';
K_RETURN                   = 'return';
K_RNMOS                    = 'rnmos';
K_RPMOS                    = 'rpmos';
K_RTRAN                    = 'rtran';
K_RTRANIF0                 = 'rtranif0';
K_RTRANIF1                 = 'rtranif1';
K_SCALARED                 = 'scalared';
K_SEQUENCE                 = 'sequence';
K_SHORTINT                 = 'shortint';
K_SHORTREAL                = 'shortreal';
K_SHOWCANCELLED            = 'showcancelled';
K_SIGNED                   = 'signed';
K_SMALL                    = 'small';
K_SOLVE                    = 'solve';
K_SPECIFY                  = 'specify';
K_SPECPARAM                = 'specparam';
//K_STATIC                   = 'static';
K_STRING                   = 'string';
K_STRONG                   = 'strong';
K_STRONG0                  = 'strong0';
K_STRONG1                  = 'strong1';
//K_STRUCT                   = 'struct';
K_SUPER                    = 'super';
K_SUPPLY0                  = 'supply0';
K_SUPPLY1                  = 'supply1';
K_TABLE                    = 'table';
K_TASK                     = 'task';
K_TIME                     = 'time';
K_TRAN                     = 'tran';
K_TRANIF0                  = 'tranif0';
K_TRANIF1                  = 'tranif1';
K_TRI                      = 'tri';
K_TRI0                     = 'tri0';
K_TRI1                     = 'tri1';
K_TRIAND                   = 'triand';
K_TRIOR                    = 'trior';
K_TRIREG                   = 'trireg';
K_UNSIGNED                 = 'unsigned';
K_USE                      = 'use';
K_UWIRE                    = 'uwire';
K_VECTORED                 = 'vectored';
K_WAIT                     = 'wait';
K_WAND                     = 'wand';
K_WEAK0                    = 'weak0';
K_WEAK1                    = 'weak1';
K_WHILE                    = 'while';
K_WIRE                     = 'wire';
K_WOR                      = 'wor';
K_XNOR                     = 'xnor';
K_XOR                      = 'xor';
KD_FATAL                   = '$fatal';
KD_ERROR                   = '$error';
KD_WARNING                 = '$warning';
KD_INFO                    = '$info';
KD_HOLD                    = '$hold';
KD_SETUP                   = '$setup';
KD_SETUPHOLD               = '$setuphold';
KD_RECOVERY                = '$recovery';
ATSIGN                     = '@';
ATTWO                      = '@@';
PLUS                       = '+';
MINUS                      = '-';
ASTERISK                   = '*';
AMPERSAND                  = '&';
DOLLAR                     = '$';
TILDE                      = '~';
FSLASH                     = '/';
PERCENT                    = '%';
ASTWO                      = '**';
CGT                        = '>';
CLT                        = '<';
BANG                       = '!';
EQUALSTWO                  = '==';
BANGEQUALS                 = '!=';
EQUALSTHREE                = '===';
BANGEQUALSTWO              = '!==';
VBAR                       = '|';
LTTWO                      = '<<';
GTTWO                      = '>>';
LTTHREE                    = '<<<';
GTTHREE                    = '>>>';
POUND                      = '#';
LPARAN                     = '(';
RPARAN                     = ')';
SEMICOLON                  = ';';
COLON                      = ':';
COMMA                      = ',';
LBRACKET                   = '[';
RBRACKET                   = ']';
LBRACE                     = '{';
RBRACE                     = '}';
PERIOD                     = '.';
EQUALS                     = '=';
QMARK                      = '?';
EQUALSGT                   = '=>';
FULLCON                    = '*>';
AMPSTAR                    = '&*';
PERIODAS                   = '.*';
COLONCOLON                 = '::';
PLUSPLUS                   = '++';
MINUSMINUS                 = '--';
GTEQUALS                   = '>=';
LTEQUALS                   = '<=';
COLONEQUALS                = ':=';
COLONFSLASH                = ':/';
POUNDTWO                   = '##';
ASCOLCOLAS                 = '*::*';
POUNDMINUSPOUND            = '#-#';
POUNDEQUALSPOUND           = '#=#';
LBASRB                     = '[*]';
LBPLUSRB                   = '[+]';
AMPTWO                     = '&&';
AMPTHREE                   = '&&&';
BARTWO                     = '||';
LPARANAS                   = '(*';
ASRPARAN                   = '*)';
APOSTROPHE                 = '\'';
PLUSEQUALS                 = '+=';
MINUSEQUALS                = '-=';
PLUSCOLON                  = '+:';
MINUSCOLON                 = '-:';
LPASRP                     = '(*)';
BARMINUSGT                 = '|->';
TILDEAMP                   = '~&';
TILDEBAR                   = '~|';
CARET                      = '^';
TILDECARET                 = '~^';
CARETTILDE                 = '^~';
LTMINUSGT                  = '<->';

EQUALSTWOQMARK             = '==?';
BANGEQUALSQMARK            = '!=?';
MINUSGT                    = '->';
}

fragment Alpha     : ('a'..'z' | 'A'..'Z');
fragment IdentChar : ('0'..'9' | 'a'..'z' | 'A'..'Z' | '$' | '_');
SIMPLE_IDENT  : (Alpha | '_') IdentChar*;

unary_op  :
    PLUS
  | MINUS
  | BANG
  | TILDE
  | AMPERSAND
  | TILDEAMP
  | VBAR
  | TILDEBAR
  | CARET
  | TILDECARET
  | CARETTILDE;


More information about the antlr-interest mailing list