[antlr-interest] Q: How do I left-factor this?

Austin Hastings Austin_Hastings at Yahoo.com
Tue Nov 13 03:01:05 PST 2007


I'm trying to lexically recognize a (recursive) block of 
code+comments+strings as a single token. I'm building the inverse of an 
island grammar -- a "hole" grammar? -- a grammar with a hole in the 
middle, like a donut. I have a rule to recognize a code block, thus:

CODE_BLOCK : NestedCodeBlock { setText(getText().substring(1, 
getText().length() - 1)); } ;

fragment MultiLineComment : '/*' .* '*/';
fragment SingleLineComment : '//' ~('\r' | '\n')* '\r'? '\n';
fragment NestedCodeBlock
    : '{'
        (options {greedy=false;}
        : MultiLineComment
        | NestedCodeBlock
        | SingleLineComment
        | QUOTED_LITERAL
        | .
        )*
      '}'
    ;

The problem is that ANTLR complains about non-LL(*) left recursion in 
alternatives 1,2 and 5, and suggests left factoring the things. I have 
found that adding option k=2 will make the problem (apparently) go away.

Can anyone tell me how I would "left factor" these things together? Is 
it just asking for help with building a predicate because the .* 
overlaps some of the others? I had thought that having the concrete '{' 
token out front would prevent any recursion issues with NestedCodeBlock. 
What did I miss?

=Austin



More information about the antlr-interest mailing list