[antlr-interest] Tokenising for context specific reserved words

Fri Jul 18 01:59:23 PDT 2008

Actually, I just thought of a hybrid approach:  support subtypes for double-quoted literals and have the subtype field set by the lexer.  Then, a doubly quoted literal reference in a grammar is handled by looking at the subtype field, while an ID reference looks at the type field.  Except for the extra field (which probably costs nothing--minimum allocation sizes are usually larger than tokens), the only added runtime overhead would be in the lexer to set the subtype field.

--Loring

----- Original Message ----
> From: Johannes Luber <jaluber at gmx.de>
> To: Jim Idle <jimi at temporal-wave.com>
> Cc: Loring Craymer <lgcraymer at yahoo.com>; antlr-interest <antlr-interest at antlr.org>
> Sent: Friday, July 18, 2008 1:12:18 AM
> Subject: Re: [antlr-interest] Tokenising for context specific reserved words
> 
> Jim Idle schrieb:
> > On Thu, 2008-07-17 at 17:36 -0700, Loring Craymer wrote:
> >> For Yggdrasil, I hide the sempred behind doubly-quoted keywords.  As 
> >> to performance:  the sempred is called less often than id (as a 
> >> rule--YMMV) and usually much less often.  The issue is aggregate 
> >> performance, not local performance; the general principle for 
> >> performance tweaking is to worry less about the cost of infrequent 
> >> calls than the cost of frequent calls.  Basically, the id approach 
> >> adds a method call and bitset inclusion test for every ID, while the 
> >> sempred costs the three calls per keyword test.
> > 
> > OK - I see where you are going. However, most of the cases I come across 
> > mean that you would be doing those 3 calls for every keyword and I think 
> > it would be quickly unreadable.
> 
> One should create a special rule to test a certain ID as keyword, as 
> this strategy removes code duplication. It may add another method call, 
> but for this kind of methods most compilers should do an inlining as 
> optimization.
> 
> Johannes
> 
> > Most languages where this happens allow 
> > almost all keywords to be used as identifiers when they are not in fact 
> > the actual keyword. The lesson then is probably to step back from the 
> > solution before implementing either one and see which makes sense for 
> > your particular situation. I can imagine that cases where a few new 
> > keywords are introduced in a new version of the language but for 
> > backward compatibility reasons they are allowed to be identifiers, may 
> > well qualify as a sempred candidate for instance.
> > 
> > There are probably better generic solutions for the whole keyword vs ID 
> > issue. Double quoting keywords seems like a reasonable way to flag 
> > something as also being available as in identifier, but then it forces 
> > the sempred route unless it is further adorned with constructs that may 
> > well then inextricably link the parser and lexer, which is 
> > probably/possibly best avoided.
> > 
> > Jim
> > 
> >>