[antlr-interest] Tokenising for context specific reserved words

Loring Craymer lgcraymer at yahoo.com
Thu Jul 17 17:36:14 PDT 2008


For Yggdrasil, I hide the sempred behind doubly-quoted keywords.  As to performance:  the sempred is called less often than id (as a rule--YMMV) and usually much less often.  The issue is aggregate performance, not local performance; the general principle for performance tweaking is to worry less about the cost of infrequent calls than the cost of frequent calls.  Basically, the id approach adds a method call and bitset inclusion test for every ID, while the sempred costs the three calls per keyword test.
--Loring



----- Original Message ----
From: Jim Idle <jimi at temporal-wave.com>
To: Loring Craymer <lgcraymer at yahoo.com>
Cc: antlr-interest <antlr-interest at antlr.org>
Sent: Thursday, July 17, 2008 5:22:55 PM
Subject: Re: [antlr-interest] Tokenising for context specific reserved words

On Thu, 2008-07-17 at 16:49 -0700, Loring Craymer wrote: 
That is one solution; however, semantic predicates-- { input.LT(1).getText().equals("foo") }? ID --are much to be preferred when there are lots of potential keywords and cost less in terms of performance since they avoid the id method call for the general case.  (Or should cost less:  ANTLR 3 currently does not reduce the generated conditionals.)

Personally I think that that construct is almost unreadable and it involves invoking LT(), getText() - which means creating the string out of the input stream, then a string comparison, which is another method call in itself. I can't see how that will cost less than looking for a token value as it invokes three method calls. Java doesn't seem to do a great job of optimizing conditionals, but it should be able to do better than two method calls, constructing a string via substring and a string comparison I should think? I would also think that the DFA is faster than that construct. 

My preference is based upon the observed performance of C I admit, where the keywords rule is a much better performer (though I might go recheck that to make sure ;-). Maybe the opposite is indeed true for Java.

Jim 



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080717/95a8bff8/attachment.html 


More information about the antlr-interest mailing list