[antlr-interest] Tokenising for context specific reserved words

Johannes Luber jaluber at gmx.de
Fri Jul 18 01:12:18 PDT 2008


Jim Idle schrieb:
> On Thu, 2008-07-17 at 17:36 -0700, Loring Craymer wrote:
>> For Yggdrasil, I hide the sempred behind doubly-quoted keywords.  As 
>> to performance:  the sempred is called less often than id (as a 
>> rule--YMMV) and usually much less often.  The issue is aggregate 
>> performance, not local performance; the general principle for 
>> performance tweaking is to worry less about the cost of infrequent 
>> calls than the cost of frequent calls.  Basically, the id approach 
>> adds a method call and bitset inclusion test for every ID, while the 
>> sempred costs the three calls per keyword test.
> 
> OK - I see where you are going. However, most of the cases I come across 
> mean that you would be doing those 3 calls for every keyword and I think 
> it would be quickly unreadable.

One should create a special rule to test a certain ID as keyword, as 
this strategy removes code duplication. It may add another method call, 
but for this kind of methods most compilers should do an inlining as 
optimization.

Johannes

> Most languages where this happens allow 
> almost all keywords to be used as identifiers when they are not in fact 
> the actual keyword. The lesson then is probably to step back from the 
> solution before implementing either one and see which makes sense for 
> your particular situation. I can imagine that cases where a few new 
> keywords are introduced in a new version of the language but for 
> backward compatibility reasons they are allowed to be identifiers, may 
> well qualify as a sempred candidate for instance.
> 
> There are probably better generic solutions for the whole keyword vs ID 
> issue. Double quoting keywords seems like a reasonable way to flag 
> something as also being available as in identifier, but then it forces 
> the sempred route unless it is further adorned with constructs that may 
> well then inextricably link the parser and lexer, which is 
> probably/possibly best avoided.
> 
> Jim
> 
>>



More information about the antlr-interest mailing list