[antlr-interest] Using ! operator in lexer rules

Fri Apr 20 04:36:44 PDT 2007

Replying to my own post with a more fleshed-out example grammar. My  
question is, does the ! operator no longer have any effect in lexer  
rules in ANTLR 3?

Here's the example ANTLR 3 grammar which you can paste into  
ANTLRWorks and try running under the debugger with input like: "foo"  
"bar" "baz"

grammar T;
words : (WORD { System.out.println("WORD: " + $WORD.text); } )* EOF ;
WORD : '"'! 'a'..'z'+ '"'!;
WS : (' ')+ { $channel = HIDDEN; } ;

I would expect the WORD tokens to have the values: foo, bar and baz;  
but instead they have "foo", "bar" and "baz" (ie. with the quotes)

To make this work under ANTLR 3 I have to take this kind of route  
instead, doing it without the ! operator:

grammar T;
words : (WORD { System.out.println("WORD: " + $WORD.text); } )* EOF ;
fragment CONTENT : 'a'..'z'+;
WORD : '"' t=CONTENT '"' { setText($CONTENT.text); };
WS : (' ')+ { $channel = HIDDEN; } ;

This correctly prints: foo, bar and baz (no enclosing quotes).

Here's the equivalent ANTLR 2 grammar which uses the ! operator and  
also correctly prints the input words without the enclosing quotes:

class TParser extends Parser;
words : (w:WORD { System.out.println("WORD: " + w.getText()); } )*;
class TLexer extends Lexer;
WORD : '\"'! ('a'..'z')+ '\"'! ;
WS : (' ')+ { $setType(Token.SKIP); } ;

So I am correct that the function of the ! operator in lexer rules  
has changed in the move from ANTLR 2 to ANTLR 3, or am I  
misunderstanding the way it did/does work?

Cheers,
Wincent

El 19/4/2007, a las 8:56, Wincent Colaiuta escribió:

> This[1] wonderful ANTLR 2 tutorial shows the use of a ! operator in  
> lexer rules to omit characters from a token's text value:
>
> CHARLIT : '\''! . '\''! ; // enclosing quotes omitted
>
> I've tried this in ANTLR 3 and the operator seems to have no effect  
> at all. The new Definitive ANTLR Reference shows many examples  
> using the operator, but only in the context of parser rules which  
> output AST nodes.
>
> As an example, I have an escaped backslash token. I want it to  
> recognize the \\ sequence in the input stream and emit a token  
> whose text value is just \. The only way I've found to do this is  
> to manually invoke setText from within an action:
>
> ESCAPED_BACKSLASH : BACKSLASH BACKSLASH { setText("\\"); };
>
> The simpler alternative doesn't work:
>
> ESCAPED_BACKSLASH : BACKSLASH! BACKSLASH;
>
> Likewise for literals:
>
> ESCAPED_BACKSLASH : '\\'! '\\';
>
> So is it correct that the ! operator no longer works in lexer rules  
> in ANTLR 3, or am I missing something here?
>
> Cheers,
> Wincent
>
> [1] <http://javadude.com/articles/antlrtut/>