[antlr-interest] Avoiding warnings without code bloat

Mon Jun 25 16:33:29 PDT 2007

Problem Solved!

On p.285 of the ANTLR book it implies that you cannot suppress
warnings in ANTLR v3 like you could in v2.

However, it appears that a semantic predicate works nicely as a workaround:

// Strings
SQ_STRING: '\''! ({true}? ESC_SEQ | ~'\'')* '\''!;
DQ_STRING: '"'!  ({true}? ESC_SEQ | ~'"')* '"'!;
BQ_STRING: '`'!  ({true}? ESC_SEQ | ~'`')* '`'!;
fragment ESC_SEQ:
	  '\\r' {$text = "\r";}
	| '\\n' {$text = "\n";}
	| '\\t' {$text = "\t";}
	| '\\a' {$text = "\a";}
	| '\\b' {$text = "\b";}
	| '\\f' {$text = "\f";}
	| '\\0' {$text = "\0";}
	| '\\u' HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT
		{
			char ch = (char)int.Parse(Text, System.Globalization.NumberStyles.HexNumber);
			$text = new string(ch, 1);
		}
	| '\\'! '\''
	| '\\'! '\"'
	| '\\'! '\`';

The generated code now contains LL(2) lookahead for no reason, and
some redundant code. For example, code that originally read

            	    if ( (LA19_0 == '\"') )
            	    {
            	        alt19 = 1;
            	    }

Now says

            	    if ( (LA19_0 == '\"') )
            	    {
            	        int LA19_1 = input.LA(2);
            	        if ( (true) )
            	        {
            	            alt19 = 1;
            	        }
            	    }

However, its behavior appears to be the same.

The compiler will emit some "unreachable code" warnings. In C# you can
disable them like this:

grammar Expr;
options {
	language=CSharp;
}
@lexer::members {
	#pragma warning disable 0162
}
@parser::members {
	#pragma warning disable 0162
}

I think there may be a caveat: using {true}? on a nullable rule can
lead to an infinite loop if there is a syntax error in the input
stream (i.e. don't say "{true}? foo" if foo can match no input).

> I'm trying to match strings with escape sequences, so I tried this:
>
> // Strings
> SQ_STRING: '\''! (ESC_SEQ | ~'\'')* '\''!;
> DQ_STRING: '"'!  (ESC_SEQ | ~'"' )* '"'!;
> fragment ESC_SEQ:
>         | '\\r' {$text = "\r";}
...
>         | '\\'! '"';
>
> But this produces 12 warnings, and this makes sense because in the
> first three lines, escape sequences can match the first and second
> alternatives.
...
> I can eliminate all warnings using a syntactic predicate:
>
> SQ_STRING: '\''! ((ESC_SEQ)=>ESC_SEQ | ~'\'')* '\''!;
> DQ_STRING: '"'!  ((ESC_SEQ)=>ESC_SEQ | ~'\"')* '"'!;
>
> However, this changes the generated code substantially; not only does
> the lexer test for an ESC_SEQ before matching it, but ALL lexer rules,
> including rules that are in no way related to strings, have additional
> lines of code such as "if (failed) return ;" sprinkled throughout
> them.
>
> So my question is, can I get the "bloat-free" behavior of the original code:
...
> while suppressing the warnings?