[antlr-interest] Problems with memory consumption when generating parsers

Sun Dec 13 18:28:23 PST 2009

2009/12/13 Gavin Lambert <antlr at mirality.co.nz>:
> At 07:37 14/12/2009, Marcin RzeÅºnicki wrote:
>>Specifically I constructed a sort of catch-all rule which I
>>called LINEOFTEXT and was like ~('\n' | '\r')*. After
>>replacing that with simple .* LINETERMINATOR my problems went
>>away.
>
> Actually, the former is better than the latter (more specific) -- you were
> just missing some parentheses:
>  (~('\n' | '\r'))* LINETERMINATOR
>

Yep, sorry, I typed that without bothering to paste hence the error.

>>ANTLR wasn't sure about typeArguments because they can be
>>arbitrarily nested (like in List<List<List<String>>>) so I
>>changed that to:
>>IDENTIFIER ( ( '<' ) => typeArguments )? ( '.' IDENTIFIER ( ( '<' )
>>=>typeArguments )? )*
>>
>>because when I expect typeIdentifier '<' inevitably marks beginning
>>of type parameter list (I hope that's good reasoning)
>
> That's odd, the original shouldn't have been ambiguous.  It could be
> something about how the '<' character is being lexed -- bear in mind that by
> using it as a quoted literal in a parser rule you are effectively creating a
> new (unnamed) token.  It's usually easier to spot lexer ambiguity and fix it
> if you explicitly define all the lexer rules yourself and don't use any
> quoted literals in the parser.
>
>

That's interesting. You made the right point, I think, and I were
wrong. I will try without quoted literals.

-- 
Greetings
Marcin Rzeźnicki