[antlr-interest] QUESTION on: How do I handle abbreviated keywords?
Gavin Lambert
antlr at mirality.co.nz
Fri Oct 31 23:36:05 PDT 2008
At 18:53 1/11/2008, Ben Gillis wrote:
>In my CSharp2 target, there *already* is both components
necessary
>for this dicationary; string values of the tokens and the
>corresponding integer token type.
No, there isn't :)
If you have a token called BEGIN, the string you'll find in
tokenNames is "BEGIN" -- which is a different string from "begin",
the keyword in the input language that you actually want to
match. And it's different again from "bgn", which might be a
valid abbreviation for the same keyword (and thus should translate
to the same token type).
The token name is purely arbitrary -- it could be called
BEGIN_KEYWORD, or KW_BEGIN, or BLOCK_START, or even FOO. While
it's usually convenient to name it similar to what it's going to
end up matching in the input language, there's no requirement to
do so -- and that's especially true of imaginary tokens, which
don't actually match anything in the input language at all (or at
least not directly).
There's no way this kind of information can be generated
automatically -- hence if you want to do things that way, then you
have to do them yourself :)
>It appears I have to duplicate some of that to make a
dictionary,
>which is OK, but surprising since ANTLR doc/publication stresses
>efficiency. i.e. it seems the target could've reorg'd it in
such
>a way as to provide this vs. requiring manual duplication of
it.
>Just thinking out loud, not complaining...overall, I'm loving
>ANTLR. :-)
Well, this is a bit of a side path, after all. That's not how
you'd normally define keywords. (At least, it's not how I ever do
it; some people do prefer that sort of thing though.)
More information about the antlr-interest
mailing list