[antlr-interest] Tokenising for context specific reserved words

Gavin Lambert antlr at mirality.co.nz
Sat Jul 19 18:16:05 PDT 2008


At 08:58 19/07/2008, Terence Parr wrote:
 >There was an interesting paper called "Schrodinger's tokens"...
 >if you are a physics or quantum physics buff, you get the
 >reference ;)

I haven't looked up the paper, but the mental images that the 
title alone conjures up sound quite cool :)

All sorts of useful things could be done if the lexer could 
generate a token that was simultaneously a member of multiple 
types -- eg. on seeing the input "10", it could generate a token 
that could be used as either a INT_LITERAL or FLOAT_LITERAL, as 
parser context demanded (or even BINARY_LITERAL, depending on 
domain).  And on seeing "if", it could be used as either 
IF_KEYWORD or IDENTIFIER.

Of course that's doable in ANTLR at the moment via helper parser 
rules, but it'd be cool if this were a first-party construct.  I 
guess it could be implemented either as multiple types stored 
against a particular instance of a token (which is the most 
flexible, but will slow down token comparisons somewhat), or as 
hierarchies of tokens (eg. in the examples above, any INT_LITERAL 
can be used in place of a FLOAT_LITERAL, but not the reverse; 
similarly, any IF_KEYWORD can be used as an IDENTIFIER, but not 
the reverse).  The second option is less flexible but it's 
probably sufficient for most scenarios I can think of at the 
moment, and I suspect it'd end up generating faster code (since 
most of it can be dealt with statically).



More information about the antlr-interest mailing list