[antlr-interest] A proposal for keywords

Fri May 26 22:19:01 PDT 2006

> Are you saying that you recommend creating my own AST class rather  
> than use "!" in a lexer rule? Do you believe that the following  
> rule is bad form?
>
> STRING: '"'! ( ESCAPE | ~('"'|'\\') )* '"'! ;

I'm not actually using an ANTLR generated lexer, but mine has a  
custom token type deriving antlr.CommonToken, and that one preserves  
the original text, e.g. input.substring(startoffset, endoffset), but  
provides a "parsed text". So I'm all for the "!" operator, as that is  
usually much easier than doing it later, but I think the originally  
parsed text should be preserved - otherwise you get error messages  
where the user thinks "huh, I didn't type that".

Somewhat related: I'm not entirely decided on what is better for the  
AST types. You have three options to carry along payload, one is a  
rather complicated class with many fields for the different payloads,  
e.g.
AST {
   AType aPayload;
   BType bPayload;
   ...
}
where always only one (or at least not all) fields are set to  
something, depending on the type of the AST node. Other possibility is
AST {
   Object payload;
}
where the payload can be of different types, again depending on the  
node type. Last option is
ATypeAST {
   AType payload;
}
BTypeAST {
   BType payload;
}
using the heterogenous AST feature.

I currently tend to think the second option is probably best. In all  
of the cases you actually have to know the type of the AST node from  
the node type, so there's hardly any difference. First one creates  
really complicated classes without adding much convenience, except  
for saving one cast. Third option requires casting and creating  
several different AST classes. The second option doesn't require many  
AST classes and brings just as much type safety as the other ones  
(none), but doesn't take any of the flexibility. With Java 1.5 you  
might also have:
AST<X extends Object> {
   X payload;
}
though that doesn't change much.

Has anyone got an opinion on that? It might help defining the ANTLR API.

Martin