[antlr-interest] Tokenising for context specific reserved words

Raphael Reitzig r_reitzi at cs.uni-kl.de
Sun Jul 20 09:03:56 PDT 2008


To be consequent, one could generate one class per token type.  
Subtyping would give you what you want, I suppose.

Anyway, what about predefining often used tokens like  
(DOUBLE|SINGLE)_QUOTED_STRING, INT, FLOAT, WHITESPACE, LINEBREAK, ...?  
With the option to overwrite them, of course.

Regards

Raphael

----- Message from antlr at mirality.co.nz ---------
     Date: Sun, 20 Jul 2008 13:16:05 +1200
     From: Gavin Lambert <antlr at mirality.co.nz>
Reply-To: Gavin Lambert <antlr at mirality.co.nz>
  Subject: Re: [antlr-interest] Tokenising for context specific reserved words
       To: Terence Parr <parrt at cs.usfca.edu>, Loring Craymer  
<lgcraymer at yahoo.com>
       Cc: antlr-interest <antlr-interest at antlr.org>


> At 08:58 19/07/2008, Terence Parr wrote:
>> There was an interesting paper called "Schrodinger's tokens"...
>> if you are a physics or quantum physics buff, you get the
>> reference ;)
>
> I haven't looked up the paper, but the mental images that the title
> alone conjures up sound quite cool :)
>
> All sorts of useful things could be done if the lexer could generate a
> token that was simultaneously a member of multiple types -- eg. on
> seeing the input "10", it could generate a token that could be used as
> either a INT_LITERAL or FLOAT_LITERAL, as parser context demanded (or
> even BINARY_LITERAL, depending on domain).  And on seeing "if", it
> could be used as either IF_KEYWORD or IDENTIFIER.
>
> Of course that's doable in ANTLR at the moment via helper parser rules,
> but it'd be cool if this were a first-party construct.  I guess it
> could be implemented either as multiple types stored against a
> particular instance of a token (which is the most flexible, but will
> slow down token comparisons somewhat), or as hierarchies of tokens (eg.
> in the examples above, any INT_LITERAL can be used in place of a
> FLOAT_LITERAL, but not the reverse; similarly, any IF_KEYWORD can be
> used as an IDENTIFIER, but not the reverse).  The second option is less
> flexible but it's probably sufficient for most scenarios I can think of
> at the moment, and I suspect it'd end up generating faster code (since
> most of it can be dealt with statically).


----- End message from antlr at mirality.co.nz -----


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: PGP Digital Signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20080720/4d6ab436/attachment-0002.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-keys
Size: 1690 bytes
Desc: PGP Public Key
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20080720/4d6ab436/attachment-0003.bin 


More information about the antlr-interest mailing list