[antlr-interest] Problem, with minimal problem-causing grammar
Phil Oliver
antlr at olivercomputing.com
Sat Jun 9 21:45:06 PDT 2007
I've been having problems defining a grammar in ANTLRworks (1.0.2,
latest) - I keep getting the "java.net.ConnectException: Connection
refused: connect" problem, which, contrary to the assumptions of
some, does not appear to have anything to do with port numbers, but
with an uncaught Java out of memory problem. Every generated grammar
that I've tested in Eclipse, that has that problem in ANTLRworks,
blows up with an out of memory error, despite having over a gigabyte
of RAM available to the JVM.
This is (almost) the minimal problem grammar that I could create that
causes the problem, a very simple one. No grammar errors are flagged.
If there's an obvious problem to fix (that actually does fix it), I'd
appreciate some feedback. Otherwise it looks like a bug in ANTLR to
me, I'm guessing some kind of code generation resulting in an
infinite loop that depletes memory. To re-iterate what I've posted
before, the specific section of code involved in the memory blowup
occurs in the generated Lexer, in this code section:
static {
int numStates = DFA4_transitionS.length;
DFA4_transition = new short[numStates][];
for (int i=0; i<numStates; i++) {
DFA4_transition[i] = DFA.unpackEncodedString(DFA4_transitionS[i]);
}
}
specifically, in the invocation of DFA.unpackEncodedString.
Some notes: removing say a few more random token definitions seems to
clear up the problem. Removing either StrNoHash or StrNoQuotAmp
removes the problem as well (but both together do not blow up when
the number of predefined tokens is slightly reduced.) Changing k=* to
k=1 doesn't fix anything.
----------
grammar blowup_example;
options {
k=*;
}
tokens {
AMP = 'amp';
APOS = 'apos';
ANCESTOR = 'ancestor';
ANCESTOR_OR_SELF = 'ancestor-or-self';
AND = 'and';
AS = 'as';
ASCENDING = 'ascending';
ASTERISK = '*';
AT = 'at';
AT_SYMBOL = '@';
ATTRIBUTE = 'attribute';
BAR = '|';
BASE_URI = 'base-uri';
BOUNDARY_SPACE = 'boundary-space';
BY = 'by';
CASE = 'case';
CAST = 'cast';
CASTABLE = 'castable';
CHILD = 'child';
COLLATION = 'collation';
COLON = ':';
COLON_EQUAL = ':=';
COMMA = ',';
COMMENT = 'comment';
CONSTRUCTION = 'construction';
COPY_NAMESPACES = 'copy-namespaces';
DASH = '-';
DCOLON = '::';
DECLARE = 'declare';
DEFAULT = 'default';
DESCENDENT = 'descendant';
DESCENDENT_OR_SELF = 'descendant-or-self';
DESCENDING = 'descending';
DIGITS_PREFIX = '&#';
DIV = 'div';
DOCUMENT = 'document';
DOCUMENT_NODE = 'document-node';
DOLLAR = '$';
DOT = '.';
E_LOWER = 'e';
E_UPPER = 'E';
ELEMENT = 'element';
SLASH_R_ANGLE_BRACKET = '/>';
ELSE = 'else';
EMPTY = 'empty';
ENCODING = 'encoding';
EQ = '=';
EQ_NAMED = 'eq';
EVERY = 'every';
EXCEPT = 'except';
EXTERNAL = 'external';
FOLLOWING = 'following';
FOLLOWING_SIBLING = 'following-sibling';
FOR = 'for';
FUNCTION = 'function';
GE_NAMED = 'ge';
GREATEST = 'greatest';
GT_NAMED = 'gt';
GT_EQ = '>=';
HASH_RPAREN = '#)';
HEX_DIGITS_PREFIX = '&#x';
IDIV = 'idiv';
IF = 'if';
IMPORT = 'import';
IN = 'in';
INHERIT = 'inherit';
INSTANCE = 'instance';
INTERSECT = 'intersect';
IS = 'is';
ITEM = 'item';
LAX = 'lax';
L_ANGLE_BRACKET = '<';
L_ANGLE_BRACKET_SLASH = '</';
L_ANGLE_BRACKET2 = '<<';
LBRACKET = '[';
LCURLY = '{';
LCURLY2 = '{{';
LE_NAMED = 'le';
LEAST = 'least';
LET = 'let';
LPAREN = '(';
LPAREN_HASH = '(#';
LT_NAMED = 'lt';
LT_EQ = '<=';
MOD = 'mod';
MODULE = 'module';
MUL = 'mul';
NAMESPACE = 'namespace';
NE = '!=';
NE_NAMED = 'ne';
NO_INHERIT = 'no-inherit';
NO_PRESERVE = 'no-preserve';
NODE = 'node';
OF = 'of';
OPTION = 'option';
OR = 'or';
ORDER = 'order';
ORDERED = 'ordered';
ORDERING = 'ordering';
PARENT = 'parent';
PLUS_SIGN = '+';
PRECEDING = 'preceding';
PRECEDING_SIBLING = 'preceding-sibling';
PRESERVE = 'preserve';
PROCESSING_INSTRUCTION = 'processing-instruction';
QUESTION = '?';
QUOT = 'quot';
R_ANGLE_BRACKET = '>';
R_ANGLE_BRACKET2 = '>>';
RBRACKET = ']';
RCURLY = '}';
RCURLY2 = '}}';
RPAREN = ')';
RETURN = 'return';
SATISFIES = 'satisfies';
SCHEMA = 'schema';
SCHEMA_ATTRIBUTE = 'schema-attribute';
SCHEMA_ELEMENT = 'schema-element';
SELF = 'self';
SLASH = '/';
SLASH2 = '//';
SOME = 'some';
STABLE = 'stable';
STRICT = 'strict';
STRIP = 'strip';
TEXT = 'text';
THEN = 'then';
TO = 'to';
TREAT = 'treat';
TYPESWITCH = 'typeswitch';
VALIDATE = 'validate';
Quot = '"';
Apos = '\'';
EscapeQuot = '""';
EscapeApos = '\'\'';
AbbrevReverseStep = '..';
Separator = ';';
VARIABLE = 'variable';
VERSION = 'version';
VOID = 'void';
UNION = 'union';
UNORDERED = 'unordered';
WHERE = 'where';
XQUERY = 'xquery';
}
literal : IntegerLiteral;
IntegerLiteral : Digit+;
StrNoHash : CharNoHash*;
fragment CharNoHash : ~'#';
StrNoQuotAmp : CharNoQuotAmp*;
fragment CharNoQuotAmp : ~('"' | '&');
fragment Digit : ('0'..'9');
More information about the antlr-interest
mailing list