[antlr-interest] Problems with memory consumption when generating parsers

Sun Dec 13 10:37:03 PST 2009

Thank you very much Jim for offering your help but I think I know now
what has been hitting me. You are right, that was lexer. Specifically
I constructed a sort of catch-all rule which I called LINEOFTEXT and
was like ~('\n' | '\r')*. After replacing that with simple .*
LINETERMINATOR my problems went away. So far so good. I also
eliminated automatic backtracking with more fine-tuned syntactic
predicates. Specifically I had rule witch matched type identifiers
with possible generics:
 IDENTIFIER (typeArguments )? ( '.' IDENTIFIER (typeArguments )? )*

ANTLR wasn't sure about typeArguments because they can be arbitrarily
nested (like in List<List<List<String>>>) so I changed that to
:
IDENTIFIER ( ( '<' ) => typeArguments )? ( '.' IDENTIFIER ( ( '<' ) =>
typeArguments )? )*

because when I expect typeIdentifier '<' inevitably marks beginning of
type parameter list (I hope that's good reasoning)

Also it had problems with array indexer: '[' expression ']'. That was
probably some kind of FOLLOW set conflict as every expression
syntactically can be followed by an array access (like in
someVar.method()[10]) so when it sees new int[10][15] it could not
decide whether it was array creation followed by array access or not.
I could forbid that in my grammar but that was leading me to other
problems so I decided to leave it as it is and told ANTLR to treat
everything like indexer in an array creation rule (which is correct as
Java forbids that array creation is followed by array access) :
( ( '[' ~']' ) => indexer )+ dims*

On Sun, Dec 13, 2009 at 7:11 PM, Jim Idle <jimi at temporal-wave.com> wrote:
> The analysis can take a lot of memory and you may just need more stack space, but it could also be your grammar construction. Lexers especially can use a lot of memory to analyse, especially if you specify huge sets of 'valid' characters'. I'll look at it if you send it to me.
>
> jim
>
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>> bounces at antlr.org] On Behalf Of Marcin Rzeznicki
>> Sent: Sunday, December 13, 2009 8:26 AM
>> To: antlr-interest at antlr.org
>> Subject: [antlr-interest] Problems with memory consumption when
>> generating parsers
>>
>> Hi to all,
>> I am experiencing some problems with excessive memory usage when
>> generating my parser. I allocated 128MB of heap memory to ANTLRWorks
>> and it cannot complete generation of parser for Java-like expressions.
>> I suspect this is rater bad sign but I am not sure whether I need, at
>> this point, to just allocate more memory and get over the issue or
>> start worrying. How do you think? Also, how can I check out which
>> parts of the grammar cause this? Are there any techniques which you
>> can recommend? I could post the grammar but I think it is too big for
>> the mailing list - but if someone would like to take a look then I'll
>> surely post it.
>> --
>> Greetings
>> Marcin Rzeźnicki
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>> email-address
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>

-- 
Pozdrawiam
Marcin Rzeźnicki