[antlr-interest] How to overcome Lexer generation OutOfMemoryError?

Gordon Tyler Gordon.Tyler at quest.com
Thu Oct 22 06:32:21 PDT 2009


This may be a silly question, but have you tried increasing the -Xmx for the JVM?

-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Robert Wentworth
Sent: October 21, 2009 10:32 PM
To: antlr-interest at antlr.org
Subject: [antlr-interest] How to overcome Lexer generation OutOfMemoryError?

Trying to generate code for the grammar below generates an  
OutOfMemoryError related to trying to analyze a DFA for the Lexer.

Can anyone suggest how I can successfully generate an ANTLR Lexer for  
this grammar (or an equivalent grammar)?

Notes:

(1) For STRINGFIELD and NUMBERFIELD, I've tried both listing out all  
words I want to match explicitly (as in the commented text), and  
constructing a rule that amounts to a decision tree to match the same  
fields (the non-commented version). Both approaches lead to the same  
problem.

(2) Note that the problem apparently arises because of an interaction  
with the presence of the WORD rule.  If I have just STRINGFIELD and  
NUMBERFIELD, or just WORD, then ANTLR easily generates code.

(3) The grammar I really want to generate has several more Lexer rules  
of comparable complexity.

Thoughts?

Thank you,
Bob Wentworth


/////////////////////////////////////////////////////////////////////

grammar LISQueryLex;

options {
	backtrack = true;
	memoize = true;
	output = template;
}

@parser::header { package dummy; }
@lexer::header { package dummy; }



WHITESPACE	:	(' ' | '\t' | '\r' | '\n')+ {$channel = HIDDEN;} ;

/*
STRINGFIELD	:	  'datatext' | 'all'
			| 'text' | 'txt'
			| 'billtext' | 'btxt'
			| 'reporttext' | 'rtxt'
			| 'legislationtp' | 'ltp'
			| 'titlesummarysubjecttext' | 'wordphrase' | 'tisstxt' | 'wp' |  
'default'
			| 'titlesummarysubject' | 'tisummsubj' | 'tiss'
			| 'titlesummary' | 'tisumm'
			| 'title' | 'ti'
			| 'latesttitle' | 'lti'
			| 'member' | 'memb'
			| 'sponsorcosponsor' | 'spco'
			| 'sponsor' | 'spon'
			| 'cosponsor' | 'cosp'
			| 'originalcosponsor' | 'ocosp'
			| 'withdrawncosponosor' | 'wcosp'
			| 'committee' | 'comm'
			| 'committeeactivitytype' | 'commactvtp'
			| 'status' | 'action' | 'actn'
			| 'lateststatus' | 'latestaction' | 'lactn'
			| 'majoraction' | 'mactn'
			| 'latestmajoraction' | 'lmactn'
			| 'subject' | 'subj' | 'sterm'
			| 'topterm' | 'tterm'
			| 'primaryterm' | 'pterm'
			| 'summary' | 'summ'
			| 'latestsummary' | 'lsumm'
			| 'note'
			| 'lawdate' | 'lawdt'
			| 'reporttype' | 'rtp'
			| 'version' | 'ver'
			| 'relationship' | 'related' | 'rel'
			| 'calendartype' | 'caltp'
			| 's_foo' | 's_bar' ;
*/
STRINGFIELD :  
('a 
'('ct 
'('ion 
'| 
'n 
')| 
'll 
')| 
'b 
'('illtext 
'| 
'txt 
')| 
'c 
'('al 
'('endartype 
'| 
'tp 
')| 
'o 
'('mm 
'('actvtp 
'| 
'ittee 
'('activitytype 
'| 
)| 
)| 
'sp 
'('onsor 
'| 
)))| 
'd 
'('atatext 
'| 
'efault 
')| 
'l 
'('a 
'('ctn 
'| 
'test 
'('action 
'| 
'majoraction 
'| 
's 
'('tatus 
'| 
'ummary 
')| 
'title 
')| 
'wd 
'('ate 
'| 
't 
'))| 
'egislationtp 
'| 
'mactn 
'| 
'summ 
'| 
't 
'('i 
'| 
'p 
'))| 
'm 
'('a 
'('ctn 
'| 
'joraction 
')| 
'emb 
'('er 
'| 
))| 
'note 
'| 
'o 
'('cosp 
'| 
'riginalcosponsor 
')| 
'p 
'('rimaryterm 
'| 
'term 
')| 
'r 
'('e 
'('l 
'('at 
'('ed 
'| 
'ionship 
')| 
)| 
'portt 
'('ext 
'| 
'ype 
'))| 
't 
'('p 
'| 
'xt 
'))| 
's 
'('p 
'('co 
'| 
'on 
'('sor 
'('cosponsor 
'| 
)| 
))| 
't 
'('atus 
'| 
'erm 
')| 
'u 
'('bj 
'('ect 
'| 
)| 
'mm 
'('ary 
'| 
))| 
'_ 
'('bar 
'| 
'foo 
'))| 
't 
'('ext 
'| 
'i 
'('s 
'('s 
'('txt 
'| 
)| 
'umm 
'('subj 
'| 
))| 
'tle 
'('summary 
'('subject 
'('text 
'| 
)| 
)| 
)| 
)| 
'opterm 
'| 
'term 
'| 
'xt')|'ver'('sion'|)|'w'('cosp'|'ithdrawncosponosor'|'ordphrase'|'p'));

/*
NUMBERFIELD	:	  'congress' | 'cong' | 'cno'
			| 'legislationnumber' | 'lno'
			| 'legislationinteger' | 'lin'
			| 'legislationpart' | 'lpt'
			| 'titletypecode' | 'titpcd'
			| 'membercode' | 'membcd'
			| 'district' | 'dist'
			| 'sponsorcosponsorcode' | 'spcocd'
			| 'sponsorcosponsordistrict' | 'spcodist'
			| 'sponsorcode' | 'sponcd'
			| 'sponsordistrict' | 'spondist'
			| 'sponsortypecode' | 'spontpcd'
			| 'cosponsorcode' | 'cospcd'
			| 'cosponsordistrict' | 'cospdist'
			| 'cosponsorcount' | 'cospct'
			| 'originalcosponosorcode' | 'ocospcd'
			| 'originalcosponsordistrict' | 'ocospdist'
			| 'originalcosponsorcount' | 'ocospct'
			| 'withdrawncosponosorcode' | 'wcospcd'
			| 'withdrawncosponsordistrict' | 'wcospdist'
			| 'withdrawncosponosorcount' | 'wcospct'
			| 'committeeactivitytypecode' | 'commactvtpcd'
			| 'housecommitteecount' | 'hcommct'
			| 'senatecommitteecount' | 'scommct'
			| 'statussessiontypecode' | 'actionsessiontypecode' | 'actnsntpcd'
			| 'majoractioncode' | 'mactncd'
			| 'latestmajoractioncode' | 'lmactncd'
			| 'summarytypecode' | 'summtpcd'
			| 'relationshipcode' | 'relatedcode' | 'relcd'
			| 'crpageinteger' | 'crpin'
			| 'voteinteger' | 'votein'
			| 'calendarintegers' | 'calin'
			| 'amendmentstointegers' | 'amdtstoin'
			| 'searchcode' | 'srchcd'
			| 'n_foo' | 'n_bar' ;
*/
NUMBERFIELD :  
('a 
'('ct 
'('ionsessiontypecode 
'| 
'nsntpcd 
')| 
'm 
'('dtstoin 
'| 
'endmentstointegers 
'))| 
'c 
'('al 
'('endarintegers 
'| 
'in 
')| 
'no 
'| 
'o 
'('mm 
'('actvtpcd 
'| 
'itteeactivitytypecode 
')| 
'ng 
'('ress 
'| 
)| 
'sp 
'('c 
'('d 
'| 
't 
')| 
'dist 
'| 
'onsor 
'('co 
'('de 
'| 
'unt 
')| 
'district 
')))| 
'rp 
'('ageinteger 
'| 
'in 
'))| 
'dist 
'('rict 
'| 
)| 
'h 
'('commct 
'| 
'ousecommitteecount 
')| 
'l 
'('atestmajoractioncode 
'| 
'egislation 
'('integer 
'| 
'number 
'| 
'part 
')| 
'in 
'| 
'mactncd 
'| 
'no 
'| 
'pt 
')| 
'm 
'('a 
'('ctncd 
'| 
'joractioncode 
')| 
'emb 
'('cd 
'| 
'ercode 
'))| 
'n_ 
'('bar 
'| 
'foo 
')| 
'o 
'('cosp 
'('c 
'('d 
'| 
't 
')| 
'dist 
')| 
'riginalcospon 
'('osorcode 
'| 
'sor 
'('count 
'| 
'district 
')))| 
'rel 
'('at 
'('edcode 
'| 
'ionshipcode 
')| 
'cd 
')| 
's 
'('commct 
'| 
'e 
'('archcode 
'| 
'natecommitteecount 
')| 
'p 
'('co 
'('cd 
'| 
'dist 
')| 
'on 
'('cd 
'| 
'dist 
'| 
'sor 
'('co 
'('de 
'| 
'sponsor 
'('code 
'| 
'district 
'))| 
'district 
'| 
'typecode 
')| 
'tpcd 
'))| 
'rchcd 
'| 
'tatussessiontypecode 
'| 
'umm 
'('arytypecode 
'| 
'tpcd 
'))| 
'tit 
'('letypecode 
'| 
'pcd 
')| 
'votein 
'('teger 
'| 
)| 
'w 
'('cosp 
'('c 
'('d 
'|'t')|'dist')|'ithdrawncospon'('osorco'('de'|'unt')|'sordistrict')));


WORD		:	(~(' '|'\t'|'\r'|'\n'|'('|')'|','|'='|'!'|'<'|'>'|'"'|'-'|'/')) 
+  ;

tmp	:	STRINGFIELD;

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list