[antlr-interest] newbie question, escaped characters
shmuel siegel
antlr at shmuelhome.mine.nu
Wed Mar 12 11:27:06 PDT 2008
Rob Shields wrote:
> Richard Clark wrote:
>> I have a better answer (courtesy of a long drive where I had time to
>> think.)
>>
>> I suggested "k=2;" because ANTLR 2 is a LL(k) parser -- it looks ahead
>> "k" tokens when resolving ambiguities and the default k is 1. In your
>> case, it's looking at that leading '\\' in more than one place and
>> resolves the ambiguity in favor of the first lexer rule using it. But
>> it makes the resulting code more complex and is a bit like swatting
>> flies with a sledgehammer.
>
> That's what I thought. I was a bit hesitant to change k in case it had
> side effects.
>
>> Rather than alter the lookahead, it's simpler to collapse the
>> decisions into one rule and alter the text in the token for your
>> couple of special cases. You should be able to write this:
>>
>> protected SIMPLETERM: (TERM_CHAR)+;
>>
>> protected TERM_CHAR: SIMPLE_TERM_CHAR | ESCAPED_TERM_CHAR;
>>
>> protected SIMPLE_TERM_CHAR: ~( ' ' | '\t' | '!' | '(' | ')' | ':' |
>> '^' | '[' | ']' | '\\' | '\"' | '{' | '}' | '~' | '/' | '\r' | '\n' );
>>
>> protected ESCAPED_TERM_CHAR: '\\'! (
>> '*' { $setText("\\*"); }
>> | '?' { $setText("\\?"); }
>> | '\\' | '+' | '-' | '!' | '(' | ')' | ':' | '^' | '[' | ']' | '\"'
>> | '{' | '}' | '~' | '/'
>> );
>
> Excellent. I have tried that and can confirm that it works. I'm really
> pleased, thankyou :)
>
>> That should do it. (By the way, ANTLR 3 replaces $setText("foo"); with
>> $text = "foo"; )
>>
>> ...Richard
>
> Well $setText("foo"); seems to work so I guess I must be using ANTLR
> 2. The jar file I have is from 2004 or thereabouts.
>
> Rob
>
Just an ANTLR3 warning. Setting text in fragment rules (the ANTLR3
equivalent of protected) does not seem to have any effect on the
generated token.
More information about the antlr-interest
mailing list