[antlr-interest] Using ! operator in lexer rules
Wincent Colaiuta
win at wincent.com
Fri Apr 20 04:36:44 PDT 2007
Replying to my own post with a more fleshed-out example grammar. My
question is, does the ! operator no longer have any effect in lexer
rules in ANTLR 3?
Here's the example ANTLR 3 grammar which you can paste into
ANTLRWorks and try running under the debugger with input like: "foo"
"bar" "baz"
grammar T;
words : (WORD { System.out.println("WORD: " + $WORD.text); } )* EOF ;
WORD : '"'! 'a'..'z'+ '"'!;
WS : (' ')+ { $channel = HIDDEN; } ;
I would expect the WORD tokens to have the values: foo, bar and baz;
but instead they have "foo", "bar" and "baz" (ie. with the quotes)
To make this work under ANTLR 3 I have to take this kind of route
instead, doing it without the ! operator:
grammar T;
words : (WORD { System.out.println("WORD: " + $WORD.text); } )* EOF ;
fragment CONTENT : 'a'..'z'+;
WORD : '"' t=CONTENT '"' { setText($CONTENT.text); };
WS : (' ')+ { $channel = HIDDEN; } ;
This correctly prints: foo, bar and baz (no enclosing quotes).
Here's the equivalent ANTLR 2 grammar which uses the ! operator and
also correctly prints the input words without the enclosing quotes:
class TParser extends Parser;
words : (w:WORD { System.out.println("WORD: " + w.getText()); } )*;
class TLexer extends Lexer;
WORD : '\"'! ('a'..'z')+ '\"'! ;
WS : (' ')+ { $setType(Token.SKIP); } ;
So I am correct that the function of the ! operator in lexer rules
has changed in the move from ANTLR 2 to ANTLR 3, or am I
misunderstanding the way it did/does work?
Cheers,
Wincent
El 19/4/2007, a las 8:56, Wincent Colaiuta escribió:
> This[1] wonderful ANTLR 2 tutorial shows the use of a ! operator in
> lexer rules to omit characters from a token's text value:
>
> CHARLIT : '\''! . '\''! ; // enclosing quotes omitted
>
> I've tried this in ANTLR 3 and the operator seems to have no effect
> at all. The new Definitive ANTLR Reference shows many examples
> using the operator, but only in the context of parser rules which
> output AST nodes.
>
> As an example, I have an escaped backslash token. I want it to
> recognize the \\ sequence in the input stream and emit a token
> whose text value is just \. The only way I've found to do this is
> to manually invoke setText from within an action:
>
> ESCAPED_BACKSLASH : BACKSLASH BACKSLASH { setText("\\"); };
>
> The simpler alternative doesn't work:
>
> ESCAPED_BACKSLASH : BACKSLASH! BACKSLASH;
>
> Likewise for literals:
>
> ESCAPED_BACKSLASH : '\\'! '\\';
>
> So is it correct that the ! operator no longer works in lexer rules
> in ANTLR 3, or am I missing something here?
>
> Cheers,
> Wincent
>
> [1] <http://javadude.com/articles/antlrtut/>
More information about the antlr-interest
mailing list