[antlr-interest] newbie question, escaped characters

Richard Clark rdclark at gmail.com
Tue Mar 11 22:14:01 PDT 2008


I have a better answer (courtesy of a long drive where I had time to think.)

I suggested "k=2;" because ANTLR 2 is a LL(k) parser -- it looks ahead
"k" tokens when resolving ambiguities and the default k is 1. In your
case, it's looking at that leading '\\' in more than one place and
resolves the ambiguity in favor of the first lexer rule using it. But
it makes the resulting code more complex and is a bit like swatting
flies with a sledgehammer.

Rather than alter the lookahead, it's simpler to collapse the
decisions into one rule and alter the text in the token for your
couple of special cases.  You should be able to write this:

protected SIMPLETERM: (TERM_CHAR)+;

protected TERM_CHAR: SIMPLE_TERM_CHAR | ESCAPED_TERM_CHAR;

protected SIMPLE_TERM_CHAR:  ~( ' ' | '\t' | '!' | '(' | ')' | ':' |
'^' | '[' | ']' | '\\' | '\"' | '{' | '}' | '~' | '/' | '\r' | '\n' );

protected ESCAPED_TERM_CHAR:  '\\'! (
    '*' { $setText("\\*"); }
 |  '?' { $setText("\\?"); }
 | '\\' | '+'  | '-' | '!' | '(' | ')' | ':' | '^' |  '[' | ']' | '\"'
| '{' | '}' | '~' |  '/'
);



That should do it. (By the way, ANTLR 3 replaces $setText("foo"); with
$text = "foo"; )

 ...Richard


More information about the antlr-interest mailing list