[antlr-interest] How to overcome Lexer generation OutOfMemoryError?
Gordon Tyler
Gordon.Tyler at quest.com
Thu Oct 22 06:32:21 PDT 2009
This may be a silly question, but have you tried increasing the -Xmx for the JVM?
-----Original Message-----
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Robert Wentworth
Sent: October 21, 2009 10:32 PM
To: antlr-interest at antlr.org
Subject: [antlr-interest] How to overcome Lexer generation OutOfMemoryError?
Trying to generate code for the grammar below generates an
OutOfMemoryError related to trying to analyze a DFA for the Lexer.
Can anyone suggest how I can successfully generate an ANTLR Lexer for
this grammar (or an equivalent grammar)?
Notes:
(1) For STRINGFIELD and NUMBERFIELD, I've tried both listing out all
words I want to match explicitly (as in the commented text), and
constructing a rule that amounts to a decision tree to match the same
fields (the non-commented version). Both approaches lead to the same
problem.
(2) Note that the problem apparently arises because of an interaction
with the presence of the WORD rule. If I have just STRINGFIELD and
NUMBERFIELD, or just WORD, then ANTLR easily generates code.
(3) The grammar I really want to generate has several more Lexer rules
of comparable complexity.
Thoughts?
Thank you,
Bob Wentworth
/////////////////////////////////////////////////////////////////////
grammar LISQueryLex;
options {
backtrack = true;
memoize = true;
output = template;
}
@parser::header { package dummy; }
@lexer::header { package dummy; }
WHITESPACE : (' ' | '\t' | '\r' | '\n')+ {$channel = HIDDEN;} ;
/*
STRINGFIELD : 'datatext' | 'all'
| 'text' | 'txt'
| 'billtext' | 'btxt'
| 'reporttext' | 'rtxt'
| 'legislationtp' | 'ltp'
| 'titlesummarysubjecttext' | 'wordphrase' | 'tisstxt' | 'wp' |
'default'
| 'titlesummarysubject' | 'tisummsubj' | 'tiss'
| 'titlesummary' | 'tisumm'
| 'title' | 'ti'
| 'latesttitle' | 'lti'
| 'member' | 'memb'
| 'sponsorcosponsor' | 'spco'
| 'sponsor' | 'spon'
| 'cosponsor' | 'cosp'
| 'originalcosponsor' | 'ocosp'
| 'withdrawncosponosor' | 'wcosp'
| 'committee' | 'comm'
| 'committeeactivitytype' | 'commactvtp'
| 'status' | 'action' | 'actn'
| 'lateststatus' | 'latestaction' | 'lactn'
| 'majoraction' | 'mactn'
| 'latestmajoraction' | 'lmactn'
| 'subject' | 'subj' | 'sterm'
| 'topterm' | 'tterm'
| 'primaryterm' | 'pterm'
| 'summary' | 'summ'
| 'latestsummary' | 'lsumm'
| 'note'
| 'lawdate' | 'lawdt'
| 'reporttype' | 'rtp'
| 'version' | 'ver'
| 'relationship' | 'related' | 'rel'
| 'calendartype' | 'caltp'
| 's_foo' | 's_bar' ;
*/
STRINGFIELD :
('a
'('ct
'('ion
'|
'n
')|
'll
')|
'b
'('illtext
'|
'txt
')|
'c
'('al
'('endartype
'|
'tp
')|
'o
'('mm
'('actvtp
'|
'ittee
'('activitytype
'|
)|
)|
'sp
'('onsor
'|
)))|
'd
'('atatext
'|
'efault
')|
'l
'('a
'('ctn
'|
'test
'('action
'|
'majoraction
'|
's
'('tatus
'|
'ummary
')|
'title
')|
'wd
'('ate
'|
't
'))|
'egislationtp
'|
'mactn
'|
'summ
'|
't
'('i
'|
'p
'))|
'm
'('a
'('ctn
'|
'joraction
')|
'emb
'('er
'|
))|
'note
'|
'o
'('cosp
'|
'riginalcosponsor
')|
'p
'('rimaryterm
'|
'term
')|
'r
'('e
'('l
'('at
'('ed
'|
'ionship
')|
)|
'portt
'('ext
'|
'ype
'))|
't
'('p
'|
'xt
'))|
's
'('p
'('co
'|
'on
'('sor
'('cosponsor
'|
)|
))|
't
'('atus
'|
'erm
')|
'u
'('bj
'('ect
'|
)|
'mm
'('ary
'|
))|
'_
'('bar
'|
'foo
'))|
't
'('ext
'|
'i
'('s
'('s
'('txt
'|
)|
'umm
'('subj
'|
))|
'tle
'('summary
'('subject
'('text
'|
)|
)|
)|
)|
'opterm
'|
'term
'|
'xt')|'ver'('sion'|)|'w'('cosp'|'ithdrawncosponosor'|'ordphrase'|'p'));
/*
NUMBERFIELD : 'congress' | 'cong' | 'cno'
| 'legislationnumber' | 'lno'
| 'legislationinteger' | 'lin'
| 'legislationpart' | 'lpt'
| 'titletypecode' | 'titpcd'
| 'membercode' | 'membcd'
| 'district' | 'dist'
| 'sponsorcosponsorcode' | 'spcocd'
| 'sponsorcosponsordistrict' | 'spcodist'
| 'sponsorcode' | 'sponcd'
| 'sponsordistrict' | 'spondist'
| 'sponsortypecode' | 'spontpcd'
| 'cosponsorcode' | 'cospcd'
| 'cosponsordistrict' | 'cospdist'
| 'cosponsorcount' | 'cospct'
| 'originalcosponosorcode' | 'ocospcd'
| 'originalcosponsordistrict' | 'ocospdist'
| 'originalcosponsorcount' | 'ocospct'
| 'withdrawncosponosorcode' | 'wcospcd'
| 'withdrawncosponsordistrict' | 'wcospdist'
| 'withdrawncosponosorcount' | 'wcospct'
| 'committeeactivitytypecode' | 'commactvtpcd'
| 'housecommitteecount' | 'hcommct'
| 'senatecommitteecount' | 'scommct'
| 'statussessiontypecode' | 'actionsessiontypecode' | 'actnsntpcd'
| 'majoractioncode' | 'mactncd'
| 'latestmajoractioncode' | 'lmactncd'
| 'summarytypecode' | 'summtpcd'
| 'relationshipcode' | 'relatedcode' | 'relcd'
| 'crpageinteger' | 'crpin'
| 'voteinteger' | 'votein'
| 'calendarintegers' | 'calin'
| 'amendmentstointegers' | 'amdtstoin'
| 'searchcode' | 'srchcd'
| 'n_foo' | 'n_bar' ;
*/
NUMBERFIELD :
('a
'('ct
'('ionsessiontypecode
'|
'nsntpcd
')|
'm
'('dtstoin
'|
'endmentstointegers
'))|
'c
'('al
'('endarintegers
'|
'in
')|
'no
'|
'o
'('mm
'('actvtpcd
'|
'itteeactivitytypecode
')|
'ng
'('ress
'|
)|
'sp
'('c
'('d
'|
't
')|
'dist
'|
'onsor
'('co
'('de
'|
'unt
')|
'district
')))|
'rp
'('ageinteger
'|
'in
'))|
'dist
'('rict
'|
)|
'h
'('commct
'|
'ousecommitteecount
')|
'l
'('atestmajoractioncode
'|
'egislation
'('integer
'|
'number
'|
'part
')|
'in
'|
'mactncd
'|
'no
'|
'pt
')|
'm
'('a
'('ctncd
'|
'joractioncode
')|
'emb
'('cd
'|
'ercode
'))|
'n_
'('bar
'|
'foo
')|
'o
'('cosp
'('c
'('d
'|
't
')|
'dist
')|
'riginalcospon
'('osorcode
'|
'sor
'('count
'|
'district
')))|
'rel
'('at
'('edcode
'|
'ionshipcode
')|
'cd
')|
's
'('commct
'|
'e
'('archcode
'|
'natecommitteecount
')|
'p
'('co
'('cd
'|
'dist
')|
'on
'('cd
'|
'dist
'|
'sor
'('co
'('de
'|
'sponsor
'('code
'|
'district
'))|
'district
'|
'typecode
')|
'tpcd
'))|
'rchcd
'|
'tatussessiontypecode
'|
'umm
'('arytypecode
'|
'tpcd
'))|
'tit
'('letypecode
'|
'pcd
')|
'votein
'('teger
'|
)|
'w
'('cosp
'('c
'('d
'|'t')|'dist')|'ithdrawncospon'('osorco'('de'|'unt')|'sordistrict')));
WORD : (~(' '|'\t'|'\r'|'\n'|'('|')'|','|'='|'!'|'<'|'>'|'"'|'-'|'/'))
+ ;
tmp : STRINGFIELD;
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
More information about the antlr-interest
mailing list