[antlr-interest] The NOT (~) Operator

Gavin Lambert antlr at mirality.co.nz
Sat Apr 12 06:14:38 PDT 2008


At 20:10 12/04/2008, Sven Busse wrote:
 >INDENTATION
 >	:	TAB* ~NEWLINE;
 >
 >NEWLINE
 >	:	'\r'? '\n';
 >
 >fragment
 >TAB	:	'\t';
 >
 >Checking the grammar with ANTLRWorks gives me this error:
 >
 >simpletest.g:0:0: syntax error: buildnfa: <AST>:6:11: unexpected 

 >AST node: ?
 >
 >The problem seems to relate to the "~NEWLINE", because if i 
delete
 >it, i get no error. Also, if i change the "INDENTATION" to a 
parser
 >rule "indentation", i get no error, but that is not an option 
for me.
 >
 >Can someone explain to me, what the reason behind this error is?

In the lexer, "~" inverts a "set" of characters (a group of single 
character alternatives).  It cannot be used on a "sequence" (one 
or more characters following another character).

In the parser, "~" similarly operates on "sets", but this time 
they're sets of tokens.  Just like the lexer, though, it can't be 
used on a sequence of tokens.

Since NEWLINE is a single token, it's valid to invert it in the 
parser level (you're saying "any token except NEWLINE").  At the 
lexer level you can't invert it though -- that would be translated 
as "any single character except the sequence of '\n' optionally 
preceded by a '\r'", which doesn't really make sense.

In this case, your best bet is probably to spell it out 
explicitly:

INDENTATION
   : TAB ~('\r' | '\n')
   ;

... but bear in mind that this will consume whichever non-newline 
character follows the tab (making it part of the INDENTATION 
token), which may not be what you really want.



More information about the antlr-interest mailing list