[antlr-interest] Avoiding warnings without code bloat

David Piepgrass qwertie256 at gmail.com
Mon Jun 25 15:07:32 PDT 2007


> Consult the examples for ways of doing this, you will find that the C
> parser and Java parser are set up to handle this.
> Jim

Actually, the C and Java parsers seem to do exactly what I tried to
do! Look at this from the C example:

STRING_LITERAL
    :  '"' ( EscapeSequence | ~('\\'|'"') )* '"'
    ;
fragment
EscapeSequence
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   OctalEscape
    ;

And the following is from Java.g:

StringLiteral
    :  '"' ( EscapeSequence | ~('\\'|'"') )* '"'
    ;
fragment
EscapeSequence
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UnicodeEscape
    |   OctalEscape
    ;

Compare what I tried:

SQ_STRING: '\''! (ESC_SEQ | ~('\'' | '\\'))* '\''!;
DQ_STRING: '"'!  (ESC_SEQ | ~('\"' | '\\'))* '"'!;
fragment ESC_SEQ:
	| '\\r' {$text = "\r";}
	| '\\n' {$text = "\n";}
	| '\\t' {$text = "\t";}
	| '\\a' {$text = "\a";}
	| '\\b' {$text = "\b";}
	| '\\f' {$text = "\f";}
	| '\\0' {$text = "\0";}
	| '\\u' HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT { ... }
	| '\\'! '\''
	| '\\'! '\"'
	| '\\'! '\`';

warning(200): Expr.g:68:41: Decision can match input such as "'\''"
using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
warning(200): Expr.g:68:41: Decision can match input such as
"{'\u0000'..'&', '('..'\uFFFE'}" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(201): Expr.g:68:41: The following alternatives are unreachable: 2,3
warning(200): Expr.g:69:40: Decision can match input such as "'"'"
using multiple alternatives: 1, 3
As a result, alternative(s) 3 were disabled for that input
warning(200): Expr.g:69:40: Decision can match input such as
"{'\u0000'..'!', '#'..'\uFFFE'}" using
multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
warning(201): Expr.g:69:40: The following alternatives are unreachable: 2,3

If the C example gives no warnings, I don't know why there is a difference.

But as I pointed out, I would like to know how to accept inputs with
invalid escapes like "\Q". A simple solution would be to add this
extra alt at the end of ESC_SEQ:

	| '\\' .;

But this produces a crapload of warnings. This can be avoided by writing

	| '\\' ~('r'|'n'|'t'|'a'|'b'|'f'|'0'|'u'|'\''|'"'|'`');

instead, but it's a tedius solution (and it doesn't generalize very
well to more complicated scenarios.)


More information about the antlr-interest mailing list