[antlr-interest] Workaround for the "code too large" problem

Thomas VIAL tvial at octo.com
Sun Jan 20 09:59:14 PST 2008


Hi,

I have been running into the "Code too large" problem. I have already changed my tokens to abstract ones with a
keyword lookup table (simplifying lexer code, as per
http://www.antlr.org/pipermail/antlr-interest/2007-August/023007.html). It worked for a while but when I threw in
a few more tokens the error came back again, on the parser this time.

I have identified 2 causes:
- big inline array tokenNames, which seems to serves informational purposes only (not used internally by ANTLR)
- big static initialization block at the end of the file, for the FOLLOW_* BitSet's

My grammar is ~400 tokens large which is too much apparentlty. As a workaround I have put together a small awk script
(see below), to be used as a post-processing step after generating the parser. It removes the supposedly unused
tokenNames array (and getter), and splits the BitSet initialization block is small methods called from the
constructor.

Of course it relies heavily on the precise layout and formatting used by ANTLR when generating the parser; it is
designed for ANTLR v3.0.1. And it works like a charm ;)

Hoping it will help others who have this very annoying "code too large" error!

Thomas


------------------------------------------------------------
function dumpBitSetGroup() {
	print "\tprivate void bitSetInitGroup_" bitSetGroups "()\n\t{";
	for (i = 0; i < bitSetIdx; i++)
		print "\t\t" (bitSetName[i]) " = " (bitSetDef[i]);
	print "\t}\n";
	bitSetIdx = 0;
	bitSetGroups++;
}

BEGIN {
	BITSET_GROUPING = 50;

	skipLine = 0;
	tokenNamesDef = 1;
	ctrLines = 0;
	inConstructor = 0;
	bitSetIdx = 0;
	bitSetGroups = 0;
}

# Get class name
/^public class/ {
	className = $3;
}

# Token names definition - we probably don't need it
/public static .* tokenNames/ {
	skipLine = 1;
	tokenNamesDef = 1;
}
# The array declaration spans 3 lines, skip them all
($0 ~ /^[[:blank:]]*"<invalid>"/) && tokenNamesDef {
	skipLine = 1;
}
($0 ~ /^[[:blank:]]*};[[:blank:]]*$/) && tokenNamesDef {
	skipLine = 1;
	tokenNamesDef = 0;
}

# Remove the getter method accordingly (it fits on one line)
/public .* getTokenNames/ {
	skipLine = 1;
}

# Remember constructor definition, we will output it later
($0 ~ /^[[:blank:]]*public/) && ($2 ~ ("^" className "\\(")) {
	inConstructor = 1;
}
(inConstructor) {
	constructor[ctrLines++] = $0;
	skipLine = 1;
}
($0 ~ /^[[:blank:]]*}[[:blank:]]*$/) && inConstructor {
	inConstructor = 0;
}

# BitSet's definition
/^[[:blank:]]*public static final BitSet FOLLOW_/ {
	# Remember definition
	split($0, decl, " = ");
	bitSetName[bitSetIdx] = $5;
	bitSetDef[bitSetIdx++] = decl[2];

	# Reduce to simple declaration, no initialization (but remove "final" attribute!)
	$0 = "\tpublic static BitSet " $5 ";"

	# Output group
	if (bitSetIdx == BITSET_GROUPING)
		dumpBitSetGroup();
}

# End of file
/^}$/ {
	# Remaining BitSet's
	dumpBitSetGroup();

	# Constructor
	print constructor[0];
	print constructor[1];
	for (i = 0; i < bitSetGroups; i++)
		print "\t\t\tbitSetInitGroup_" i "();";
	for (i = 2; i < ctrLines; i++)
		print constructor[i];
}

# Output
(!skipLine) {
	print;
}
{
	skipLine = 0;
}



More information about the antlr-interest mailing list