[antlr-interest] Tokenising for context specific reserved words

Roshan James roshanj at google.com
Thu Jul 17 18:18:59 PDT 2008


Thank you for your responses. I am biased against using semantics predicates
for the reason that I want to keep out the target language as much as
possible from the parser that generates the AST.

Are there any other approaches other than Jims?

The downside I see to this approach is that one always needs to keep a
global list of keywords under the 'id' production. This becomes a bit of a
nightmare for maintenance.

Roshan



On Thu, Jul 17, 2008 at 5:36 PM, Loring Craymer <lgcraymer at yahoo.com> wrote:

> For Yggdrasil, I hide the sempred behind doubly-quoted keywords.  As to
> performance:  the sempred is called less often than id (as a rule--YMMV) and
> usually much less often.  The issue is aggregate performance, not local
> performance; the general principle for performance tweaking is to worry less
> about the cost of infrequent calls than the cost of frequent calls.
> Basically, the id approach adds a method call and bitset inclusion test for
> every ID, while the sempred costs the three calls per keyword test.
>
>
>
> --Loring
>
>
> ----- Original Message ----
> From: Jim Idle <jimi at temporal-wave.com>
> To: Loring Craymer <lgcraymer at yahoo.com>
> Cc: antlr-interest <antlr-interest at antlr.org>
> Sent: Thursday, July 17, 2008 5:22:55 PM
> Subject: Re: [antlr-interest] Tokenising for context specific reserved
> words
>
> On Thu, 2008-07-17 at 16:49 -0700, Loring Craymer wrote:
>
> That is one solution; however, semantic predicates-- {
> input.LT(1).getText().equals("foo") }? ID --are much to be preferred when
> there are lots of potential keywords and cost less in terms of performance
> since they avoid the id method call for the general case.  (Or should cost
> less:  ANTLR 3 currently does not reduce the generated conditionals.)
>
>
> Personally I think that that construct is almost unreadable and it involves
> invoking LT(), getText() - which means creating the string out of the input
> stream, then a string comparison, which is another method call in itself. I
> can't see how that will cost less than looking for a token value as it
> invokes three method calls. Java doesn't seem to do a great job of
> optimizing conditionals, but it should be able to do better than two method
> calls, constructing a string via substring and a string comparison I should
> think? I would also think that the DFA is faster than that construct.
>
> My preference is based upon the observed performance of C I admit, where
> the keywords rule is a much better performer (though I might go recheck that
> to make sure ;-). Maybe the opposite is indeed true for Java.
>
> Jim
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080717/dcd5e1fb/attachment-0001.html 


More information about the antlr-interest mailing list