[antlr-interest] How to overcome Lexer generation OutOfMemoryError?

Robert Wentworth bob at wentworth.bz
Wed Oct 21 19:32:10 PDT 2009

Trying to generate code for the grammar below generates an  
OutOfMemoryError related to trying to analyze a DFA for the Lexer.

Can anyone suggest how I can successfully generate an ANTLR Lexer for  
this grammar (or an equivalent grammar)?


(1) For STRINGFIELD and NUMBERFIELD, I've tried both listing out all  
words I want to match explicitly (as in the commented text), and  
constructing a rule that amounts to a decision tree to match the same  
fields (the non-commented version). Both approaches lead to the same  

(2) Note that the problem apparently arises because of an interaction  
with the presence of the WORD rule.  If I have just STRINGFIELD and  
NUMBERFIELD, or just WORD, then ANTLR easily generates code.

(3) The grammar I really want to generate has several more Lexer rules  
of comparable complexity.


Thank you,
Bob Wentworth


grammar LISQueryLex;

options {
	backtrack = true;
	memoize = true;
	output = template;

@parser::header { package dummy; }
@lexer::header { package dummy; }

WHITESPACE	:	(' ' | '\t' | '\r' | '\n')+ {$channel = HIDDEN;} ;

STRINGFIELD	:	  'datatext' | 'all'
			| 'text' | 'txt'
			| 'billtext' | 'btxt'
			| 'reporttext' | 'rtxt'
			| 'legislationtp' | 'ltp'
			| 'titlesummarysubjecttext' | 'wordphrase' | 'tisstxt' | 'wp' |  
			| 'titlesummarysubject' | 'tisummsubj' | 'tiss'
			| 'titlesummary' | 'tisumm'
			| 'title' | 'ti'
			| 'latesttitle' | 'lti'
			| 'member' | 'memb'
			| 'sponsorcosponsor' | 'spco'
			| 'sponsor' | 'spon'
			| 'cosponsor' | 'cosp'
			| 'originalcosponsor' | 'ocosp'
			| 'withdrawncosponosor' | 'wcosp'
			| 'committee' | 'comm'
			| 'committeeactivitytype' | 'commactvtp'
			| 'status' | 'action' | 'actn'
			| 'lateststatus' | 'latestaction' | 'lactn'
			| 'majoraction' | 'mactn'
			| 'latestmajoraction' | 'lmactn'
			| 'subject' | 'subj' | 'sterm'
			| 'topterm' | 'tterm'
			| 'primaryterm' | 'pterm'
			| 'summary' | 'summ'
			| 'latestsummary' | 'lsumm'
			| 'note'
			| 'lawdate' | 'lawdt'
			| 'reporttype' | 'rtp'
			| 'version' | 'ver'
			| 'relationship' | 'related' | 'rel'
			| 'calendartype' | 'caltp'
			| 's_foo' | 's_bar' ;

NUMBERFIELD	:	  'congress' | 'cong' | 'cno'
			| 'legislationnumber' | 'lno'
			| 'legislationinteger' | 'lin'
			| 'legislationpart' | 'lpt'
			| 'titletypecode' | 'titpcd'
			| 'membercode' | 'membcd'
			| 'district' | 'dist'
			| 'sponsorcosponsorcode' | 'spcocd'
			| 'sponsorcosponsordistrict' | 'spcodist'
			| 'sponsorcode' | 'sponcd'
			| 'sponsordistrict' | 'spondist'
			| 'sponsortypecode' | 'spontpcd'
			| 'cosponsorcode' | 'cospcd'
			| 'cosponsordistrict' | 'cospdist'
			| 'cosponsorcount' | 'cospct'
			| 'originalcosponosorcode' | 'ocospcd'
			| 'originalcosponsordistrict' | 'ocospdist'
			| 'originalcosponsorcount' | 'ocospct'
			| 'withdrawncosponosorcode' | 'wcospcd'
			| 'withdrawncosponsordistrict' | 'wcospdist'
			| 'withdrawncosponosorcount' | 'wcospct'
			| 'committeeactivitytypecode' | 'commactvtpcd'
			| 'housecommitteecount' | 'hcommct'
			| 'senatecommitteecount' | 'scommct'
			| 'statussessiontypecode' | 'actionsessiontypecode' | 'actnsntpcd'
			| 'majoractioncode' | 'mactncd'
			| 'latestmajoractioncode' | 'lmactncd'
			| 'summarytypecode' | 'summtpcd'
			| 'relationshipcode' | 'relatedcode' | 'relcd'
			| 'crpageinteger' | 'crpin'
			| 'voteinteger' | 'votein'
			| 'calendarintegers' | 'calin'
			| 'amendmentstointegers' | 'amdtstoin'
			| 'searchcode' | 'srchcd'
			| 'n_foo' | 'n_bar' ;

WORD		:	(~(' '|'\t'|'\r'|'\n'|'('|')'|','|'='|'!'|'<'|'>'|'"'|'-'|'/')) 
+  ;


