[antlr-interest] Tokenising for context specific reserved words
Loring Craymer
lgcraymer at yahoo.com
Fri Jul 18 01:59:23 PDT 2008
Actually, I just thought of a hybrid approach: support subtypes for double-quoted literals and have the subtype field set by the lexer. Then, a doubly quoted literal reference in a grammar is handled by looking at the subtype field, while an ID reference looks at the type field. Except for the extra field (which probably costs nothing--minimum allocation sizes are usually larger than tokens), the only added runtime overhead would be in the lexer to set the subtype field.
--Loring
----- Original Message ----
> From: Johannes Luber <jaluber at gmx.de>
> To: Jim Idle <jimi at temporal-wave.com>
> Cc: Loring Craymer <lgcraymer at yahoo.com>; antlr-interest <antlr-interest at antlr.org>
> Sent: Friday, July 18, 2008 1:12:18 AM
> Subject: Re: [antlr-interest] Tokenising for context specific reserved words
>
> Jim Idle schrieb:
> > On Thu, 2008-07-17 at 17:36 -0700, Loring Craymer wrote:
> >> For Yggdrasil, I hide the sempred behind doubly-quoted keywords. As
> >> to performance: the sempred is called less often than id (as a
> >> rule--YMMV) and usually much less often. The issue is aggregate
> >> performance, not local performance; the general principle for
> >> performance tweaking is to worry less about the cost of infrequent
> >> calls than the cost of frequent calls. Basically, the id approach
> >> adds a method call and bitset inclusion test for every ID, while the
> >> sempred costs the three calls per keyword test.
> >
> > OK - I see where you are going. However, most of the cases I come across
> > mean that you would be doing those 3 calls for every keyword and I think
> > it would be quickly unreadable.
>
> One should create a special rule to test a certain ID as keyword, as
> this strategy removes code duplication. It may add another method call,
> but for this kind of methods most compilers should do an inlining as
> optimization.
>
> Johannes
>
> > Most languages where this happens allow
> > almost all keywords to be used as identifiers when they are not in fact
> > the actual keyword. The lesson then is probably to step back from the
> > solution before implementing either one and see which makes sense for
> > your particular situation. I can imagine that cases where a few new
> > keywords are introduced in a new version of the language but for
> > backward compatibility reasons they are allowed to be identifiers, may
> > well qualify as a sempred candidate for instance.
> >
> > There are probably better generic solutions for the whole keyword vs ID
> > issue. Double quoting keywords seems like a reasonable way to flag
> > something as also being available as in identifier, but then it forces
> > the sempred route unless it is further adorned with constructs that may
> > well then inextricably link the parser and lexer, which is
> > probably/possibly best avoided.
> >
> > Jim
> >
> >>
More information about the antlr-interest
mailing list