[antlr-interest] newbie question, escaped characters

Wed Mar 12 11:27:06 PDT 2008

Rob Shields wrote:
> Richard Clark wrote:
>> I have a better answer (courtesy of a long drive where I had time to 
>> think.)
>>
>> I suggested "k=2;" because ANTLR 2 is a LL(k) parser -- it looks ahead
>> "k" tokens when resolving ambiguities and the default k is 1. In your
>> case, it's looking at that leading '\\' in more than one place and
>> resolves the ambiguity in favor of the first lexer rule using it. But
>> it makes the resulting code more complex and is a bit like swatting
>> flies with a sledgehammer.
>
> That's what I thought. I was a bit hesitant to change k in case it had 
> side effects.
>
>> Rather than alter the lookahead, it's simpler to collapse the
>> decisions into one rule and alter the text in the token for your
>> couple of special cases.  You should be able to write this:
>>
>> protected SIMPLETERM: (TERM_CHAR)+;
>>
>> protected TERM_CHAR: SIMPLE_TERM_CHAR | ESCAPED_TERM_CHAR;
>>
>> protected SIMPLE_TERM_CHAR:  ~( ' ' | '\t' | '!' | '(' | ')' | ':' |
>> '^' | '[' | ']' | '\\' | '\"' | '{' | '}' | '~' | '/' | '\r' | '\n' );
>>
>> protected ESCAPED_TERM_CHAR:  '\\'! (
>>     '*' { $setText("\\*"); }
>>  |  '?' { $setText("\\?"); }
>>  | '\\' | '+'  | '-' | '!' | '(' | ')' | ':' | '^' |  '[' | ']' | '\"'
>> | '{' | '}' | '~' |  '/'
>> );
>
> Excellent. I have tried that and can confirm that it works. I'm really 
> pleased, thankyou :)
>
>> That should do it. (By the way, ANTLR 3 replaces $setText("foo"); with
>> $text = "foo"; )
>>
>>  ...Richard
>
> Well $setText("foo"); seems to work so I guess I must be using ANTLR 
> 2. The jar file I have is from 2004 or thereabouts.
>
> Rob
>
Just an ANTLR3 warning. Setting text in fragment rules (the ANTLR3 
equivalent of protected) does not seem to have any effect on the 
generated token.