[antlr-interest] ANTLR enforces LL(1) beyond about 300 tokens
A Z
asicaddress at gmail.com
Sun Aug 22 12:38:05 PDT 2010
On Sun, Aug 22, 2010 at 11:54 AM, Kenneth Domino <
kenneth.domino at domemtech.com> wrote:
> allows a clean compile but the ANTLR book indicates ANTLR should try to
>> match in the order listed.
>>
>
> Hard to say without looking at the full grammar file, but you could try to
> separate
> the lexer and parser grammars into two files, then use "option
> {filter=true;}" in
> the lexer grammar. (The option seems to only work if you make two
> grammars.)
> That generates a lexer that uses a rule order priority, which is what you
> mention.
> By default, Antlr does not generate this kind of lexer since the
> filter-type lexer
> has its problems (See The Definitive ANTLR Reference). Also, when
> separating
> the parser and lexer grammars, it seems Antlr has a problem keeping the two
> token lists
> in synch. You can fix this with a hack: declare all tokens, with the same
> exact order,
> in the two grammars. Not sure why,
>
> Ken
>
>
>
>
Thanks for the response.
I modified the grammar to remove all the parser rules(except for one so
ANTLR doesn't complain). At the bottom is the grammar which compiles with no
warnings or errors.
If I comment out either K_STATIC or K_STRUCT, the compile fails with 40
warnings and 1 error:
________________________
warning(209): temp2.g:280:1: Multiple token rules can match input such as
"'&'": AMPERSAND, AMPSTAR, AMPTWO, AMPTHREE
As a result, token(s) AMPSTAR,AMPTWO,AMPTHREE were disabled for that input
warning(209): temp2.g:283:1: Multiple token rules can match input such as
"'*'": ASTERISK, ASTWO, FULLCON, ASCOLCOLAS, ASRPARAN
As a result, token(s) ASTWO,FULLCON,ASCOLCOLAS,ASRPARAN were disabled for
that input
...
error(208): temp2.g:299:1: The following token definitions can never be
matched because prior tokens match the same input: <Then it lists all of the
tokens>
________________________
I now think this may be specific to the binary I am using
(ANTLR Parser Generator Version 3.2 Fedora release 12 (Constantine) Thu Mar
11 20:28:57 UTC 2010).
I do not see the issue if I use ANTLRWorks. I'll look into this more.
grammar temp2;
tokens
{
K_ACCEPT_ON = 'accept_on';
K_ALIAS = 'alias';
K_ALWAYS = 'always';
K_ALWAYS_COMB = 'always_comb';
K_ALWAYS_FF = 'always_ff';
K_ALWAYS_LATCH = 'always_latch';
K_AND = 'and';
K_ASSERT = 'assert';
K_ASSIGN = 'assign';
K_ASSUME = 'assume';
K_AUTOMATIC = 'automatic';
K_BEFORE = 'before';
K_BEGIN = 'begin';
K_BIND = 'bind';
K_BINS = 'bins';
K_BINSOF = 'binsof';
K_BIT = 'bit';
K_BREAK = 'break';
K_BUF = 'buf';
K_BUFIF0 = 'bufif0';
K_BUFIF1 = 'bufif1';
K_BYTE = 'byte';
K_CASE = 'case';
K_CASEX = 'casex';
K_CASEZ = 'casez';
K_CELL = 'cell';
K_CHANDLE = 'chandle';
K_CHECKER = 'checker';
K_CLASS = 'class';
K_CLOCKING = 'clocking';
K_CMOS = 'cmos';
K_CONFIG = 'config';
K_CONST = 'const';
K_CONSTRAINT = 'constraint';
K_CONTEXT = 'context';
K_CONTINUE = 'continue';
K_COVER = 'cover';
K_COVERGROUP = 'covergroup';
K_COVERPOINT = 'coverpoint';
K_CROSS = 'cross';
K_DEASSIGN = 'deassign';
K_DEFAULT = 'default';
K_DEFPARAM = 'defparam';
K_DESIGN = 'design';
K_DISABLE = 'disable';
K_DIST = 'dist';
K_DO = 'do';
K_EDGE = 'edge';
K_ELSE = 'else';
K_END = 'end';
K_ENDCASE = 'endcase';
K_ENDCHECKER = 'endchecker';
K_ENDCLASS = 'endclass';
K_ENDCLOCKING = 'endclocking';
K_ENDCONFIG = 'endconfig';
K_ENDFUNCTION = 'endfunction';
K_ENDGENERATE = 'endgenerate';
K_ENDGROUP = 'endgroup';
K_ENDINTERFACE = 'endinterface';
K_ENDMODULE = 'endmodule';
K_ENDPACKAGE = 'endpackage';
K_ENDPRIMITIVE = 'endprimitive';
K_ENDPROGRAM = 'endprogram';
K_ENDPROPERTY = 'endproperty';
K_ENDSPECIFY = 'endspecify';
K_ENDSEQUENCE = 'endsequence';
K_ENDTABLE = 'endtable';
K_ENDTASK = 'endtask';
K_ENUM = 'enum';
K_EVENT = 'event';
K_EVENTUALLY = 'eventually';
K_EXPECT = 'expect';
K_EXPORT = 'export';
K_EXTENDS = 'extends';
K_EXTERN = 'extern';
K_FINAL = 'final';
K_FIRST_MATCH = 'first_match';
K_FOR = 'for';
K_FORCE = 'force';
K_FOREACH = 'foreach';
K_FOREVER = 'forever';
K_FORK = 'fork';
K_FORKJOIN = 'forkjoin';
K_FUNCTION = 'function';
K_GENERATE = 'generate';
K_GENVAR = 'genvar';
K_GLOBAL = 'global';
K_HIGHZ0 = 'highz0';
K_HIGHZ1 = 'highz1';
K_IF = 'if';
K_IFF = 'iff';
K_IFNONE = 'ifnone';
K_IGNORE_BINS = 'ignore_bins';
K_ILLEGAL_BINS = 'illegal_bins';
K_IMPLIES = 'implies';
K_IMPORT = 'import';
K_INCDIR = 'incdir';
K_INCLUDE = 'include';
K_INITIAL = 'initial';
K_INOUT = 'inout';
K_INPUT = 'input';
K_INSIDE = 'inside';
K_INSTANCE = 'instance';
K_INT = 'int';
K_INTEGER = 'integer';
K_INTERFACE = 'interface';
K_INTERSECT = 'intersect';
K_JOIN = 'join';
K_JOIN_ANY = 'join_any';
K_JOIN_NONE = 'join_none';
K_LARGE = 'large';
K_LET = 'let';
K_LIBLIST = 'liblist';
K_LIBRARY = 'library';
K_LOCAL = 'local';
K_LOCALPARAM = 'localparam';
K_LOGIC = 'logic';
K_LONGINT = 'longint';
K_MACROMODULE = 'macromodule';
K_MATCHES = 'matches';
K_MEDIUM = 'medium';
K_MODPORT = 'modport';
K_MODULE = 'module';
K_NAND = 'nand';
K_NEGEDGE = 'negedge';
K_NEW = 'new';
K_NEXTTIME = 'nexttime';
K_NMOS = 'nmos';
K_NOR = 'nor';
K_NOSHOWCANCELLED = 'noshowcancelled';
K_NOT = 'not';
K_NOTIF0 = 'notif0';
K_NOTIF1 = 'notif1';
K_NULL = 'null';
K_OR = 'or';
K_OUTPUT = 'output';
K_PACKAGE = 'package';
K_PACKED = 'packed';
K_PARAMETER = 'parameter';
K_PMOS = 'pmos';
K_POSEDGE = 'posedge';
K_PRIMITIVE = 'primitive';
K_PRIORITY = 'priority';
K_PROGRAM = 'program';
K_PROPERTY = 'property';
K_PROTECTED = 'protected';
K_PULL0 = 'pull0';
K_PULL1 = 'pull1';
K_PULLDOWN = 'pulldown';
K_PULLUP = 'pullup';
K_PULSESTYLE_ONDETECT = 'pulsestyle_ondetect';
K_PULSESTYLE_ONEVENT = 'pulsestyle_onevent';
K_PURE = 'pure';
K_RAND = 'rand';
K_RANDC = 'randc';
K_RANDCASE = 'randcase';
K_RANDSEQUENCE = 'randsequence';
K_RCMOS = 'rcmos';
K_REAL = 'real';
K_REALTIME = 'realtime';
K_REF = 'ref';
K_REG = 'reg';
K_REJECT_ON = 'reject_on';
K_RELEASE = 'release';
K_REPEAT = 'repeat';
K_RESTRICT = 'restrict';
K_RETURN = 'return';
K_RNMOS = 'rnmos';
K_RPMOS = 'rpmos';
K_RTRAN = 'rtran';
K_RTRANIF0 = 'rtranif0';
K_RTRANIF1 = 'rtranif1';
K_SCALARED = 'scalared';
K_SEQUENCE = 'sequence';
K_SHORTINT = 'shortint';
K_SHORTREAL = 'shortreal';
K_SHOWCANCELLED = 'showcancelled';
K_SIGNED = 'signed';
K_SMALL = 'small';
K_SOLVE = 'solve';
K_SPECIFY = 'specify';
K_SPECPARAM = 'specparam';
//K_STATIC = 'static';
K_STRING = 'string';
K_STRONG = 'strong';
K_STRONG0 = 'strong0';
K_STRONG1 = 'strong1';
//K_STRUCT = 'struct';
K_SUPER = 'super';
K_SUPPLY0 = 'supply0';
K_SUPPLY1 = 'supply1';
K_TABLE = 'table';
K_TASK = 'task';
K_TIME = 'time';
K_TRAN = 'tran';
K_TRANIF0 = 'tranif0';
K_TRANIF1 = 'tranif1';
K_TRI = 'tri';
K_TRI0 = 'tri0';
K_TRI1 = 'tri1';
K_TRIAND = 'triand';
K_TRIOR = 'trior';
K_TRIREG = 'trireg';
K_UNSIGNED = 'unsigned';
K_USE = 'use';
K_UWIRE = 'uwire';
K_VECTORED = 'vectored';
K_WAIT = 'wait';
K_WAND = 'wand';
K_WEAK0 = 'weak0';
K_WEAK1 = 'weak1';
K_WHILE = 'while';
K_WIRE = 'wire';
K_WOR = 'wor';
K_XNOR = 'xnor';
K_XOR = 'xor';
KD_FATAL = '$fatal';
KD_ERROR = '$error';
KD_WARNING = '$warning';
KD_INFO = '$info';
KD_HOLD = '$hold';
KD_SETUP = '$setup';
KD_SETUPHOLD = '$setuphold';
KD_RECOVERY = '$recovery';
ATSIGN = '@';
ATTWO = '@@';
PLUS = '+';
MINUS = '-';
ASTERISK = '*';
AMPERSAND = '&';
DOLLAR = '$';
TILDE = '~';
FSLASH = '/';
PERCENT = '%';
ASTWO = '**';
CGT = '>';
CLT = '<';
BANG = '!';
EQUALSTWO = '==';
BANGEQUALS = '!=';
EQUALSTHREE = '===';
BANGEQUALSTWO = '!==';
VBAR = '|';
LTTWO = '<<';
GTTWO = '>>';
LTTHREE = '<<<';
GTTHREE = '>>>';
POUND = '#';
LPARAN = '(';
RPARAN = ')';
SEMICOLON = ';';
COLON = ':';
COMMA = ',';
LBRACKET = '[';
RBRACKET = ']';
LBRACE = '{';
RBRACE = '}';
PERIOD = '.';
EQUALS = '=';
QMARK = '?';
EQUALSGT = '=>';
FULLCON = '*>';
AMPSTAR = '&*';
PERIODAS = '.*';
COLONCOLON = '::';
PLUSPLUS = '++';
MINUSMINUS = '--';
GTEQUALS = '>=';
LTEQUALS = '<=';
COLONEQUALS = ':=';
COLONFSLASH = ':/';
POUNDTWO = '##';
ASCOLCOLAS = '*::*';
POUNDMINUSPOUND = '#-#';
POUNDEQUALSPOUND = '#=#';
LBASRB = '[*]';
LBPLUSRB = '[+]';
AMPTWO = '&&';
AMPTHREE = '&&&';
BARTWO = '||';
LPARANAS = '(*';
ASRPARAN = '*)';
APOSTROPHE = '\'';
PLUSEQUALS = '+=';
MINUSEQUALS = '-=';
PLUSCOLON = '+:';
MINUSCOLON = '-:';
LPASRP = '(*)';
BARMINUSGT = '|->';
TILDEAMP = '~&';
TILDEBAR = '~|';
CARET = '^';
TILDECARET = '~^';
CARETTILDE = '^~';
LTMINUSGT = '<->';
EQUALSTWOQMARK = '==?';
BANGEQUALSQMARK = '!=?';
MINUSGT = '->';
}
fragment Alpha : ('a'..'z' | 'A'..'Z');
fragment IdentChar : ('0'..'9' | 'a'..'z' | 'A'..'Z' | '$' | '_');
SIMPLE_IDENT : (Alpha | '_') IdentChar*;
unary_op :
PLUS
| MINUS
| BANG
| TILDE
| AMPERSAND
| TILDEAMP
| VBAR
| TILDEBAR
| CARET
| TILDECARET
| CARETTILDE;
More information about the antlr-interest
mailing list