[antlr-interest] Java Heap Out of Memory exception

John B. Brodie jbb at acm.org
Wed Sep 8 18:16:58 PDT 2010


Greetings!

I have not fully studied your grammar, but I did notice one *BIG*
problem as noted below.

On Wed, 2010-09-08 at 17:38 -0700, J W wrote:
> Hopefully someone can help me figure out my issue.  I have a fairly 
> simple grammar.  If I verify my script using CommonTokenStream, I 
> receive a Java heap out of memory exception.  However, if I use the 
> ANTLR plugin to test with the same value, I do not receive the exception
>  and I see the NoViableAlt exception as expected.
> 
> Here is the block I'm verifying:
> 
> Action Group 3 Needs Attention
>    BranchHours
>    Contact A Branch
>       Call 1, Ack=Contact, NoConnect=Contact
>    EndContact
> EndAction
> 
> There are two errors with the script above.  The first is the Needs Attention text after Action Group 3.  The second is the Branch text after Contact A.  These were values entered during negative testing that caused the out of memory exception.  Yet, if I copy and paste this script into the Eclipse ANTLR interpreter and verify I see the NoViableAltException I'm expecting.

Your ANDOR lexer rule says that the *EMPTY* string is a valid token!

So when any sequence of characters that are not recognizable is
encountered, the lexer must insert the empty token (ANDOR in this case)
before the erroneous text ----- because you said the empty string is a
valid token. and when we encounter an invalid token we must deliver all
possible valid tokens (the ANDOR empty token in this case) before
announcing the error condition.

I do not see where your grammar is able to tokenize the string "Needs"
and so the above stuff happens.

So this is why you get an out of memory condition. Because there are
infinitely many empty ANDOR tokens in front of the the illegal character
sequence "Needs" ----- because you have said that the empty string is a
valid Token.

and so now why does the ANTLRWorks interpreter do something different? i
do not know, but, well, perhaps, it is because the interpreter does not
do the same thing that the generated code does. I believe this is a well
known problem, search the mail archives on markmail to verify.

Avoid any lexer rule that recognizes the *EMPTY* string.


and as an aside it appears that you are trying to do too much work in
your lexer.....

.....snipped.....
> ANDOR: (AND | OR)?;


Hope this helps...
   -jbb




More information about the antlr-interest mailing list