[antlr-interest] Simple grammar with error

Sun Sep 16 04:07:08 PDT 2007

At 22:09 16/09/2007, Johannes Luber wrote:
 >Additionally, in many languages a particular
 >operator has overloaded meanings. An example:
 >
 >addressof_expression
 >	:	BITWISE_AND unary_expression
 >	;
 >
 >where
 >
 >BITWISE_AND : '&';
 >
 >Whenever I read BITWISE_AND I have to replace it with '&' and
 >reparse it as OP_ADDRESS. I can't use it in the grammar itself
 >because there won't be OP_ADDRESS tokens. Of course, I could
 >do a rewrite, but it may not be worth the effort. In such cases, 

 >I'd wish ANTLR would allow to map BITWISE_AND and OP_ADDRESS to
 >the same token (although the debugger may be confused).

This is just an example of choosing the wrong name for the 
token.  Given that lexically an '&' might be a "bitwise and" or it 
might be an "address-of operator" (and the lexer will have no idea 
which one), the best name for the token would just be something 
like AMP.  Leave it to parser rules to assign more semantic 
meaning to it.  Like so:

tokens {
   AMP = '&';
   BITWISE_AND;
   OP_ADDRESS;
}
...
addressof_expression
   : AMP unary_expression -> ^(OP_ADDRESS unary_expression)
   ;

And it's still cleaner (especially when looking at the generated 
code or at error message outputs) to see the token being referred 
to as AMP instead of as T31.

(And also, if you're modelling a C++-like language that supports 
operator overloading, even BITWISE_AND isn't necessarily a good 
name, since that's an overridable operator and so might end up 
doing something completely different.)