[antlr-interest] newbie question, escaped characters

Rob Shields rob at cmsnet.org.uk
Wed Mar 12 11:03:45 PDT 2008


Richard Clark wrote:
> I have a better answer (courtesy of a long drive where I had time to think.)
> 
> I suggested "k=2;" because ANTLR 2 is a LL(k) parser -- it looks ahead
> "k" tokens when resolving ambiguities and the default k is 1. In your
> case, it's looking at that leading '\\' in more than one place and
> resolves the ambiguity in favor of the first lexer rule using it. But
> it makes the resulting code more complex and is a bit like swatting
> flies with a sledgehammer.

That's what I thought. I was a bit hesitant to change k in case it had 
side effects.

> Rather than alter the lookahead, it's simpler to collapse the
> decisions into one rule and alter the text in the token for your
> couple of special cases.  You should be able to write this:
> 
> protected SIMPLETERM: (TERM_CHAR)+;
> 
> protected TERM_CHAR: SIMPLE_TERM_CHAR | ESCAPED_TERM_CHAR;
> 
> protected SIMPLE_TERM_CHAR:  ~( ' ' | '\t' | '!' | '(' | ')' | ':' |
> '^' | '[' | ']' | '\\' | '\"' | '{' | '}' | '~' | '/' | '\r' | '\n' );
> 
> protected ESCAPED_TERM_CHAR:  '\\'! (
>     '*' { $setText("\\*"); }
>  |  '?' { $setText("\\?"); }
>  | '\\' | '+'  | '-' | '!' | '(' | ')' | ':' | '^' |  '[' | ']' | '\"'
> | '{' | '}' | '~' |  '/'
> );

Excellent. I have tried that and can confirm that it works. I'm really 
pleased, thankyou :)

> That should do it. (By the way, ANTLR 3 replaces $setText("foo"); with
> $text = "foo"; )
> 
>  ...Richard

Well $setText("foo"); seems to work so I guess I must be using ANTLR 2. 
The jar file I have is from 2004 or thereabouts.

Rob



More information about the antlr-interest mailing list