[antlr-interest] Stack Overflow for Tree grammar CSharp target

Sam Harwell sharwell at pixelminegames.com
Tue Sep 2 15:59:01 PDT 2008


Yikes, you just found a way to crash our product. I must say though, the
input required to do so was absurd! It took some 20 full lines of
3*3*3*3*3*...! LOL

My operator precedence parser is also vulnerable to this, but the tree
grammar will fail much earlier. By replacing the tail recursion in the
precedence tree builder with a hybrid stack/recurse routine, it could
handle very large input expressions. The main problem remains in the
tree grammar. I'll be working on this, but no real answer yet. For now,
I'm modified my expression rule to throw an OperationCanceledException
if the expression stack gets too deep. I catch that exception at a top
level, issue an error/warning, and toss the parse results. The
definition of _expressionDepth is in my separate partial class file.

expression
@init
{
	_expressionDepth++;
	if ( _expressionDepth > 50 )
		throw new OperationCanceledException( "Expression nested
too deeply." );
}
	:	assignExpr // <-- this is actually a giant switch now,
like yours is
	;
finally
{
	_expressionDepth--;
}

Sam

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Johannes Luber
Sent: Tuesday, September 02, 2008 4:58 PM
To: Greg Smolyn
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Stack Overflow for Tree grammar CSharp
target

Greg Smolyn schrieb:
> Hi,
> 
> I've got a rather large rule in a tree grammar (using the CSharp2
> target, ANTLR 3.1), that when faced with a large input tree tends to
> blow up with a StackOverflowException.
> This happens only with some pretty crazy (but unfortunately required)
> input that forces a very deep recursion.
> 
> The rule looks like 
> 
> expr 
>   : ^( ASSIGN expr expr )
>   | ^(MULT expr expr)
>   | ^(AND expr expr)
>   | ^(TYPEOF expr)
>  ... <snipped about 50 others>...
>   ; 
> 
> 
> This generates an expr() method with quite a few local variables, and
> which can only handle about ~200 recursions  (ie. given a tree like
"(*
> (* (* (* (* (* .... (* 2 3) ... 4) 5) 6) 7) 8) 9)").
> There seem to be about 3 local variable references for every case,
which
> means the thing is putting at least 150 pointers onto the stack at
every
> recursive step.  
> For example:
>
<http://www.julielist.com/x.xxx?p=54.000.I32p2.1018147.107-102&url=http:
//www.4realcash.com/affiliates/gallhit/1130677/163/7/2/0>CommonTree
> ASSIGN76 = null;
> CommonTree MULT79 = null;
> ...
> expr_return expr77 = default(expr_return);
> expr_return expr78 = default(expr_return);
> ...
> CommonTree ASSIGN76_tree=null;
> CommonTree MULT79_tree=null;
> 
> Would this be a bug or future optimization?  Am I going to have to go
> down the path of changing my parser to emit some imaginary AST nodes
> that can group these into smaller categories (ie. additiveExprs, etc).
>  I'd rather not do the latter as it would mean changing a whole pile
of
> other tree grammars using the generated ASTs. 
> 
> Interestingly I see that really none of those variables are actually
> used outside of their respective case statements.  Could not these
> variables be moved there, reducing the load for recursive calls?
> 
> Would mucking around in the StringTemplates for the csharp2 target be
> something worth trying?  Or is this something even higher up in the
> antlr AST generation?
> 
> Thanks for any insights!
> 
> -greg

Maybe <http://antlr.org/wiki/display/ANTLR3/Operator+precedence+parser>
works here?

Johannes
> 
>
------------------------------------------------------------------------
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-addr
ess
> 


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-addr
ess



More information about the antlr-interest mailing list