[antlr-interest] Context Sensitive Keyword Support?

Wed Feb 16 21:05:01 PST 2011

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Scott Stanchfield
> Sent: Wednesday, February 16, 2011 8:06 PM
> To: antlr-interest Interest
> Subject: Re: [antlr-interest] Context Sensitive Keyword Support?
>
> Two things I don't like about that approach:
>
> * The scanner would need to match each possibility against the text -
> adds time to the scan when you really only need one of the
> possibilities. My approach has the parser tell the scanner what tokens
> are possible and the scanner will only consider those possibilities

It does not do that, it does a single match as now and then tells the
token what its possible types are. The parser driven lexer basically falls
down in a number of areas to do with lookahead, which I think you are
missing. In your case (I have not looked at your code yet though) the
parser would have to call the lexer and say can I have an X and a Y and a
Z and it will say yes on the X, yes on the Y then nay on the Z, so you
would have to backtrack and re-scan? This is why you end up with a
scannerless parser instead. In the superposition token, you cannot have
ambiguous lexer definitions, but the tokens so scanned may answer to a
number of possibilities; I feel that this covers the huge majority of
cases.

>
> * It really only works for very discrete token values. Suppose the
> parser were switching to an embedded language;

You really need lexer modes for that, which will be in v4 I think.

> that embedded language
> could use multi-word tokens or delimit things like comments in a very
> different manner.

You end up having context one way or another, but without second guessing
you, I think that you will find you are limited in lookahead/predicates
and that it is ultimately just better to go scannerless. Hand crafted
parsers often do what you are suggesting, so it isn't without merit, but I
think it has practical limitations for a generic recognizer generator of
LL(k).

>
> I'll have to tinker with this when I get a chance... don't know that
> that's likely until after we've performed (in April) the play I'm
> directing...

Good luck on the play :-)

Jim

> -- Scott
>
> ----------------------------------------
> Scott Stanchfield
> http://javadude.com
>
>
>
> On Wed, Feb 16, 2011 at 10:56 PM, Michael Bedward
> <michael.bedward at gmail.com> wrote:
> > On 17 February 2011 09:21, Jim Idle <jimi at temporal-wave.com> wrote:
> >> I think that the quantum token idea is a much better one in that a
> >> token can simultaneously be ID and WHERE or any other token that it
> >> is flagged as being possible to be. This removes context from the
> >> lexer and allows the parser to decide.
> >>
> >
> > Yes please !!!  This seems like a wonderfully elegant and very useful
> idea.
> >
> > Michael
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> address
> >
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address