[antlr-interest] Avoiding warnings without code bloat

David Piepgrass qwertie256 at gmail.com
Mon Jun 25 11:48:27 PDT 2007


I'm trying to match strings with escape sequences, so I tried this:

// Strings
SQ_STRING: '\''! (ESC_SEQ | ~'\'')* '\''!;
DQ_STRING: '"'!  (ESC_SEQ | ~'"' )* '"'!;
fragment ESC_SEQ:
	| '\\r' {$text = "\r";}
	| '\\n' {$text = "\n";}
	| '\\t' {$text = "\t";}
	| '\\a' {$text = "\a";}
	| '\\b' {$text = "\b";}
	| '\\f' {$text = "\f";}
	| '\\0' {$text = "\0";}
	| '\\u' HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT {
		char ch = (char)int.Parse(Text,
			System.Globalization.NumberStyles.HexNumber);
		Text = new string(ch);
	}
	| '\\'! '\''
	| '\\'! '\"'
	| '\\'! '\`';

But this produces 12 warnings, and this makes sense because in the
first three lines, escape sequences can match the first and second
alternatives. Now I tried the following:

SQ_STRING: '\''! (ESC_SEQ | ~('\'' | '\\'))* '\''!; // line 57
DQ_STRING: '"'!  (ESC_SEQ | ~('\"' | '\\'))* '"'!;

This is not actually what I want because it appears that an invalid
escape sequence like \Q cannot be parsed by the above rules. I tried
it because I thought it would get rid of the warnings, but 3 warnings
were still produced for each kind of string. Can someone explain these
warnings? I do not see how '\'' can be matched in more than one way.

warning(200): Expr.g:57:43: Decision can match input such as "'\''"
using multiple alternatives: 1,
3
As a result, alternative(s) 3 were disabled for that input
warning(200): Expr.g:57:43: Decision can match input such as
"{'\u0000'..'&', '('..'\uFFFE'}" using
multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(201): Expr.g:57:43: The following alternatives are unreachable: 2,3

I can eliminate all warnings using a syntactic predicate:

SQ_STRING: '\''! ((ESC_SEQ)=>ESC_SEQ | ~'\'')* '\''!;
DQ_STRING: '"'!  ((ESC_SEQ)=>ESC_SEQ | ~'\"')* '"'!;

However, this changes the generated code substantially; not only does
the lexer test for an ESC_SEQ before matching it, but ALL lexer rules,
including rules that are in no way related to strings, have additional
lines of code such as "if (failed) return ;" sprinkled throughout
them.

So my question is, can I get the "bloat-free" behavior of the original code:

SQ_STRING: '\''! (ESC_SEQ | ~'\'')* '\''!;
DQ_STRING: '"'!  (ESC_SEQ | ~'"' )* '"'!;

while suppressing the warnings?


More information about the antlr-interest mailing list