[antlr-interest] Compiler-exploding grammar

Steve Bennett stevagewp at gmail.com
Wed Nov 28 16:39:18 PST 2007


On 11/29/07, Andy Tripp <antlr at jazillian.com> wrote:
> I don't follow. Are you saying that ANTLR generated a single line that
> was 14000 characters
> wide, and javac choked on it? If so, that sounds like a javac bug to me.

No, it's more than a single line and apparently more than 14000
characters wide, too. My editor shows about 30 lines in a row at least
that long, but it truncates them at that 14384 characters.

Looking into it a bit, it looks like some sort of combinatorial
expansion gone wrong caused by the combination of two parser rules.
Rule #1:

close_bold_italics
@after {text_bold=false; text_italics = false;}
:
        {text_bold==true && text_italics==true}? =>  -> B_OFF I_OFF
       |{text_bold==false && text_italics==true}? => -> I_OFF
       |{text_bold==true && text_italics==false}? => -> B_OFF

       ;

bold_and_italics:
     {textis("''") && text_italics}? => APOSTROPHES
{text_italics=false;} ->         ^(I_OFF)
    |{textis("''") && !text_italics}? => APOSTROPHES
{text_italics=true;} ->          ^(I_ON)
    |{textis("'''") && text_bold}? => APOSTROPHES
{text_bold=false;} ->            ^(B_OFF)
    |{textis("'''") && !text_bold}? => APOSTROPHES   {text_bold=true;}
->             ^(B_ON)
    |{textis("''''") && text_bold}? => APOSTROPHES
{text_bold=false;} -> APOSTROPHE ^(B_OFF)
    |{textis("''''") && !text_bold}? => APOSTROPHES  {text_bold=true;}
-> APOSTROPHE  ^(B_ON)
    |{textis("'''''") && text_bold && text_italics}? =>  APOSTROPHES
{text_bold=false; text_italics=false; } -> ^(B_OFF) ^(I_OFF)
    |{textis("'''''") && text_bold && !text_italics}? => APOSTROPHES
{text_bold=false; text_italics=true; }  -> ^(B_OFF) ^(I_ON)
    |{textis("'''''") && !text_bold && text_italics}? => APOSTROPHES
{text_bold=true; text_italics=false; }  -> ^(B_ON)  ^(I_OFF)
    |{textis("'''''") && !text_bold && !text_italics}? =>APOSTROPHES
{text_bold=true; text_italics=true; }   -> ^(B_ON)  ^(I_ON)
    ;

The resulting mess:

(text_bold==false && text_italics==true&&textis("''''") && !text_bold)
||(text_bold==true && text_italics==false&&textis("'''") && text_bold)
||(text_bold==true && text_italics==false&&textis("'''''") &&
text_bold && !text_italics)
||(text_bold==true && text_italics==true&&textis("''''") && !text_bold)
||(text_bold==true && text_italics==true&&textis("'''''") &&
!text_bold && !text_italics)
...

going through *all* the combinations, even the nonsensical ones:
...
||(text_bold==true && text_italics==true&&textis("'''") && !text_bold)
...

Even so, I don't really get why it explodes quite so massively. I also
can't really say why these two rules are even joined in this cartesian
product like that. The only thing they really have in common is that
they produce the same AST node type.

Steve


More information about the antlr-interest mailing list