[antlr-interest] Compiler-exploding grammar

Terence Parr parrt at cs.usfca.edu
Wed Nov 28 17:03:07 PST 2007


Hi. Try 3.1 daily build (only java at moment).  I think i fixed  
something in that area.
Ter
On Nov 28, 2007, at 4:39 PM, Steve Bennett wrote:

> On 11/29/07, Andy Tripp <antlr at jazillian.com> wrote:
>> I don't follow. Are you saying that ANTLR generated a single line  
>> that
>> was 14000 characters
>> wide, and javac choked on it? If so, that sounds like a javac bug  
>> to me.
>
> No, it's more than a single line and apparently more than 14000
> characters wide, too. My editor shows about 30 lines in a row at least
> that long, but it truncates them at that 14384 characters.
>
> Looking into it a bit, it looks like some sort of combinatorial
> expansion gone wrong caused by the combination of two parser rules.
> Rule #1:
>
> close_bold_italics
> @after {text_bold=false; text_italics = false;}
> :
>         {text_bold==true && text_italics==true}? =>  -> B_OFF I_OFF
>        |{text_bold==false && text_italics==true}? => -> I_OFF
>        |{text_bold==true && text_italics==false}? => -> B_OFF
>
>        ;
>
> bold_and_italics:
>      {textis("''") && text_italics}? => APOSTROPHES
> {text_italics=false;} ->         ^(I_OFF)
>     |{textis("''") && !text_italics}? => APOSTROPHES
> {text_italics=true;} ->          ^(I_ON)
>     |{textis("'''") && text_bold}? => APOSTROPHES
> {text_bold=false;} ->            ^(B_OFF)
>     |{textis("'''") && !text_bold}? => APOSTROPHES   {text_bold=true;}
> ->             ^(B_ON)
>     |{textis("''''") && text_bold}? => APOSTROPHES
> {text_bold=false;} -> APOSTROPHE ^(B_OFF)
>     |{textis("''''") && !text_bold}? => APOSTROPHES  {text_bold=true;}
> -> APOSTROPHE  ^(B_ON)
>     |{textis("'''''") && text_bold && text_italics}? =>  APOSTROPHES
> {text_bold=false; text_italics=false; } -> ^(B_OFF) ^(I_OFF)
>     |{textis("'''''") && text_bold && !text_italics}? => APOSTROPHES
> {text_bold=false; text_italics=true; }  -> ^(B_OFF) ^(I_ON)
>     |{textis("'''''") && !text_bold && text_italics}? => APOSTROPHES
> {text_bold=true; text_italics=false; }  -> ^(B_ON)  ^(I_OFF)
>     |{textis("'''''") && !text_bold && !text_italics}? =>APOSTROPHES
> {text_bold=true; text_italics=true; }   -> ^(B_ON)  ^(I_ON)
>     ;
>
> The resulting mess:
>
> (text_bold==false && text_italics==true&&textis("''''") && !text_bold)
> ||(text_bold==true && text_italics==false&&textis("'''") && text_bold)
> ||(text_bold==true && text_italics==false&&textis("'''''") &&
> text_bold && !text_italics)
> ||(text_bold==true && text_italics==true&&textis("''''") && ! 
> text_bold)
> ||(text_bold==true && text_italics==true&&textis("'''''") &&
> !text_bold && !text_italics)
> ...
>
> going through *all* the combinations, even the nonsensical ones:
> ...
> ||(text_bold==true && text_italics==true&&textis("'''") && !text_bold)
> ...
>
> Even so, I don't really get why it explodes quite so massively. I also
> can't really say why these two rules are even joined in this cartesian
> product like that. The only thing they really have in common is that
> they produce the same AST node type.
>
> Steve



More information about the antlr-interest mailing list