[antlr-interest] Problems with memory consumption when generating parsers

Sun Dec 13 12:08:46 PST 2009

At 07:37 14/12/2009, Marcin RzeÅºnicki wrote:
 >Specifically I constructed a sort of catch-all rule which I
 >called LINEOFTEXT and was like ~('\n' | '\r')*. After
 >replacing that with simple .* LINETERMINATOR my problems went
 >away.

Actually, the former is better than the latter 
(more specific) -- you were just missing some parentheses:
   (~('\n' | '\r'))* LINETERMINATOR

 >ANTLR wasn't sure about typeArguments because they can be
 >arbitrarily nested (like in List<List<List<String>>>) so I
 >changed that to:
 >IDENTIFIER ( ( '<' ) => typeArguments )? ( '.' 
IDENTIFIER ( ( '<' )
 >=>typeArguments )? )*
 >
 >because when I expect typeIdentifier '<' 
inevitably marks beginning
 >of type parameter list (I hope that's good reasoning)

That's odd, the original shouldn't have been 
ambiguous.  It could be something about how the 
'<' character is being lexed -- bear in mind that 
by using it as a quoted literal in a parser rule 
you are effectively creating a new (unnamed) 
token.  It's usually easier to spot lexer 
ambiguity and fix it if you explicitly define all 
the lexer rules yourself and don't use any quoted 
literals in the parser.