[antlr-interest] QUESTION on: How do I handle abbreviated keywords?

Fri Oct 31 23:36:05 PDT 2008

At 18:53 1/11/2008, Ben Gillis wrote:
 >In my CSharp2 target, there *already* is both components 
necessary
 >for this dicationary; string values of the tokens and the
 >corresponding integer token type.

No, there isn't :)

If you have a token called BEGIN, the string you'll find in 
tokenNames is "BEGIN" -- which is a different string from "begin", 
the keyword in the input language that you actually want to 
match.  And it's different again from "bgn", which might be a 
valid abbreviation for the same keyword (and thus should translate 
to the same token type).

The token name is purely arbitrary -- it could be called 
BEGIN_KEYWORD, or KW_BEGIN, or BLOCK_START, or even FOO.  While 
it's usually convenient to name it similar to what it's going to 
end up matching in the input language, there's no requirement to 
do so -- and that's especially true of imaginary tokens, which 
don't actually match anything in the input language at all (or at 
least not directly).

There's no way this kind of information can be generated 
automatically -- hence if you want to do things that way, then you 
have to do them yourself :)

 >It appears I have to duplicate some of that to make a 
dictionary,
 >which is OK, but surprising since ANTLR doc/publication stresses 

 >efficiency.  i.e. it seems the target could've reorg'd it in 
such
 >a way as to provide this vs. requiring manual duplication of 
it.
 >Just thinking out loud, not complaining...overall, I'm loving
 >ANTLR.  :-)

Well, this is a bit of a side path, after all.  That's not how 
you'd normally define keywords.  (At least, it's not how I ever do 
it; some people do prefer that sort of thing though.)