[antlr-interest] Tokenising for context specific reserved words

Jim Idle jimi at temporal-wave.com
Thu Jul 17 18:19:24 PDT 2008


On Thu, 2008-07-17 at 17:36 -0700, Loring Craymer wrote:
> For Yggdrasil, I hide the sempred behind doubly-quoted keywords.  As
> to performance:  the sempred is called less often than id (as a
> rule--YMMV) and usually much less often.  The issue is aggregate
> performance, not local performance; the general principle for
> performance tweaking is to worry less about the cost of infrequent
> calls than the cost of frequent calls.  Basically, the id approach
> adds a method call and bitset inclusion test for every ID, while the
> sempred costs the three calls per keyword test.


OK - I see where you are going. However, most of the cases I come across
mean that you would be doing those 3 calls for every keyword and I think
it would be quickly unreadable. Most languages where this happens allow
almost all keywords to be used as identifiers when they are not in fact
the actual keyword. The lesson then is probably to step back from the
solution before implementing either one and see which makes sense for
your particular situation. I can imagine that cases where a few new
keywords are introduced in a new version of the language but for
backward compatibility reasons they are allowed to be identifiers, may
well qualify as a sempred candidate for instance. 

There are probably better generic solutions for the whole keyword vs ID
issue. Double quoting keywords seems like a reasonable way to flag
something as also being available as in identifier, but then it forces
the sempred route unless it is further adorned with constructs that may
well then inextricably link the parser and lexer, which is
probably/possibly best avoided. 

Jim

> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080717/658b029c/attachment.html 


More information about the antlr-interest mailing list