[antlr-interest] Tokenising for context specific reserved words
Gavin Lambert
antlr at mirality.co.nz
Sat Jul 19 18:16:05 PDT 2008
At 08:58 19/07/2008, Terence Parr wrote:
>There was an interesting paper called "Schrodinger's tokens"...
>if you are a physics or quantum physics buff, you get the
>reference ;)
I haven't looked up the paper, but the mental images that the
title alone conjures up sound quite cool :)
All sorts of useful things could be done if the lexer could
generate a token that was simultaneously a member of multiple
types -- eg. on seeing the input "10", it could generate a token
that could be used as either a INT_LITERAL or FLOAT_LITERAL, as
parser context demanded (or even BINARY_LITERAL, depending on
domain). And on seeing "if", it could be used as either
IF_KEYWORD or IDENTIFIER.
Of course that's doable in ANTLR at the moment via helper parser
rules, but it'd be cool if this were a first-party construct. I
guess it could be implemented either as multiple types stored
against a particular instance of a token (which is the most
flexible, but will slow down token comparisons somewhat), or as
hierarchies of tokens (eg. in the examples above, any INT_LITERAL
can be used in place of a FLOAT_LITERAL, but not the reverse;
similarly, any IF_KEYWORD can be used as an IDENTIFIER, but not
the reverse). The second option is less flexible but it's
probably sufficient for most scenarios I can think of at the
moment, and I suspect it'd end up generating faster code (since
most of it can be dealt with statically).
More information about the antlr-interest
mailing list