[antlr-interest] Problem, with minimal problem-causing grammar

Phil Oliver antlr at olivercomputing.com
Sat Jun 9 21:45:06 PDT 2007


I've been having problems defining a grammar in ANTLRworks (1.0.2, 
latest) - I keep getting the "java.net.ConnectException: Connection 
refused: connect" problem, which, contrary to the assumptions of 
some, does not appear to have anything to do with port numbers, but 
with an uncaught Java out of memory problem. Every generated grammar 
that I've tested in Eclipse, that has that problem in ANTLRworks, 
blows up with an out of memory error, despite having over a gigabyte 
of RAM available to the JVM.

This is (almost) the minimal problem grammar that I could create that 
causes the problem, a very simple one. No grammar errors are flagged. 
If there's an obvious problem to fix (that actually does fix it), I'd 
appreciate some feedback. Otherwise it looks like a bug in ANTLR to 
me, I'm guessing some kind of code generation resulting in an 
infinite loop that depletes memory. To re-iterate what I've posted 
before, the specific section of code involved in the memory blowup 
occurs in the generated Lexer, in this code section:

     static {
         int numStates = DFA4_transitionS.length;
         DFA4_transition = new short[numStates][];
         for (int i=0; i<numStates; i++) {
             DFA4_transition[i] = DFA.unpackEncodedString(DFA4_transitionS[i]);
         }
     }

specifically, in the invocation of DFA.unpackEncodedString.

Some notes: removing say a few more random token definitions seems to 
clear up the problem. Removing either StrNoHash or StrNoQuotAmp 
removes the problem as well (but both together do not blow up when 
the number of predefined tokens is slightly reduced.) Changing k=* to 
k=1 doesn't fix anything.

----------

grammar blowup_example;

options {
	k=*;
}

tokens {
	AMP				= 'amp';
	APOS				= 'apos';
	ANCESTOR			= 'ancestor';
	ANCESTOR_OR_SELF		= 'ancestor-or-self';
	AND				= 'and';
	AS				= 'as';
	ASCENDING			= 'ascending';
	ASTERISK			= '*';
	AT				= 'at';
	AT_SYMBOL			= '@';
	ATTRIBUTE			= 'attribute';
	BAR				= '|';
	BASE_URI			= 'base-uri';
	BOUNDARY_SPACE			= 'boundary-space';
	BY				= 'by';	
	CASE				= 'case';
	CAST				= 'cast';
	CASTABLE			= 'castable';
	CHILD				= 'child';
	COLLATION			= 'collation';
	COLON				= ':';
	COLON_EQUAL			= ':=';
	COMMA				= ',';
	COMMENT				= 'comment';
	CONSTRUCTION			= 'construction';
	COPY_NAMESPACES			= 'copy-namespaces';
	DASH				= '-';
	DCOLON				= '::';
	DECLARE				= 'declare';
	DEFAULT				= 'default';
	DESCENDENT			= 'descendant';
	DESCENDENT_OR_SELF		= 'descendant-or-self';
	DESCENDING			= 'descending';
	DIGITS_PREFIX			= '&#';
	DIV				= 'div';
	DOCUMENT			= 'document';
	DOCUMENT_NODE			= 'document-node';
	DOLLAR				= '$';
	DOT				= '.';
	E_LOWER				= 'e';
	E_UPPER				= 'E';
	ELEMENT				= 'element';
	SLASH_R_ANGLE_BRACKET		= '/>';
	ELSE				= 'else';
	EMPTY				= 'empty';
	ENCODING			= 'encoding';
	EQ				= '=';
	EQ_NAMED			= 'eq';
	EVERY				= 'every';
	EXCEPT				= 'except';
	EXTERNAL			= 'external';
	FOLLOWING			= 'following';
	FOLLOWING_SIBLING		= 'following-sibling';
	FOR				= 'for';
	FUNCTION			= 'function';
	GE_NAMED			= 'ge';
	GREATEST			= 'greatest';
	GT_NAMED			= 'gt';
	GT_EQ				= '>=';
	HASH_RPAREN			= '#)';
	HEX_DIGITS_PREFIX		= '&#x';
	IDIV				= 'idiv';
	IF				= 'if';
	IMPORT				= 'import';
	IN				= 'in';
	INHERIT				= 'inherit';
	INSTANCE			= 'instance';
	INTERSECT			= 'intersect';
	IS				= 'is';
	ITEM				= 'item';
	LAX				= 'lax';
	L_ANGLE_BRACKET			= '<';
	L_ANGLE_BRACKET_SLASH		= '</';
	L_ANGLE_BRACKET2		= '<<';
	LBRACKET			= '[';
	LCURLY				= '{';
	LCURLY2				= '{{';
	LE_NAMED			= 'le';
	LEAST				= 'least';
	LET				= 'let';
	LPAREN				= '(';
	LPAREN_HASH			= '(#';
	LT_NAMED			= 'lt';
	LT_EQ				= '<=';
	MOD				= 'mod';
	MODULE				= 'module';
	MUL				= 'mul';
	NAMESPACE			= 'namespace';
	NE				= '!=';
	NE_NAMED			= 'ne';
	NO_INHERIT			= 'no-inherit';
	NO_PRESERVE			= 'no-preserve';
	NODE				= 'node';
	OF				= 'of';
	OPTION				= 'option';
	OR				= 'or';
	ORDER				= 'order';		
	ORDERED				= 'ordered';
	ORDERING			= 'ordering';
	PARENT				= 'parent';
	PLUS_SIGN			= '+';
	PRECEDING			= 'preceding';
	PRECEDING_SIBLING		= 'preceding-sibling';
	PRESERVE			= 'preserve';
	PROCESSING_INSTRUCTION		= 'processing-instruction';
	QUESTION			= '?';
	QUOT				= 'quot';
	R_ANGLE_BRACKET			= '>';
	R_ANGLE_BRACKET2		= '>>';		
	RBRACKET			= ']';	
	RCURLY				= '}';	
	RCURLY2				= '}}';
	RPAREN				= ')';	
	RETURN				= 'return';
	SATISFIES			= 'satisfies';
	SCHEMA				= 'schema';
	SCHEMA_ATTRIBUTE		= 'schema-attribute';
	SCHEMA_ELEMENT			= 'schema-element';
	SELF				= 'self';	
	SLASH				= '/';	
	SLASH2				= '//';		
	SOME				= 'some';
	STABLE				= 'stable';
	STRICT				= 'strict';
	STRIP				= 'strip';
	TEXT				= 'text';
	THEN				= 'then';	
	TO				= 'to';
	TREAT				= 'treat';
	TYPESWITCH			= 'typeswitch';	
	VALIDATE			= 'validate';	
	Quot				= '"';
	Apos				= '\'';
	EscapeQuot			= '""';
	EscapeApos			= '\'\'';	
	AbbrevReverseStep		= '..';	
	Separator			= ';';
	VARIABLE			= 'variable';	
	VERSION				= 'version';	
	VOID				= 'void';			
	UNION				= 'union';
	UNORDERED			= 'unordered';	
	WHERE				= 'where';	
	XQUERY				= 'xquery';	
}

literal				: IntegerLiteral;
IntegerLiteral		 	: Digit+;	

StrNoHash			: CharNoHash*;
fragment CharNoHash		: ~'#';

StrNoQuotAmp			: CharNoQuotAmp*;
fragment CharNoQuotAmp		: ~('"' | '&');

fragment Digit			: ('0'..'9');



More information about the antlr-interest mailing list