[antlr-interest] A proposal for keywords

Anthony Youngman Anthony.Youngman at eca-international.com
Wed May 31 08:06:16 PDT 2006


Having read the other responses ...
 
What's wrong with a filter to do this? I've always wanted to put a
stateful filter between the lexer and parser to deal with precisely this
sort of scenario. The filter can then change the token type depending on
what came before and after. I'm not sure how to specify it, but certain
sequences are invalid. My favourite example of what needs to be handled
...
 
REM: REM = REM(6,4); REM this shows four uses of REM, as a label,
variable, function and statement.
 
As I've commented previously, this is valid DATABASIC code...
 
Cheers,
Wol

________________________________

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Loring Craymer
Sent: 23 May 2006 19:10
To: antlr-interest at antlr.org
Subject: [antlr-interest] A proposal for keywords



Handling keywords in grammars is an awkward problem.  Languages handle
keywords in one of two ways:  1.)  keywords are uniquely named, or 2.)
keyword names can be used for other language elements (variable names,
etc.).  ANTLR 2 preferentially supports option 1; PCCTS allowed either,
but directly supported neither (option 1 could be supported by adding
symbol table lookup in lexer actions; option 2 could be supported by
predicate hoisting).

 

It strikes me that the difference is solely a matter of when types are
bound to tokens.  For option 1, types are bound to tokens in the lexer.
For option 2, literal types should be bound in the parser:  that is, the
lexer binds the generic type to the token (TEXT or NUMBER, for example)
and the dynamically looks up the next token in the literals table
whenever attempting to match a literal.  That is, "if" would be first
typed as TEXT but matched (and retyped) as LITERAL_if when matching an
occurrence of "if" in the parser.

 

I was concerned that this might not work with the LL(*) DFAs of ANTLR 3,
until I realized that the predicate hoisting mechanism provides almost
all of the support needed.  (Some sort of type patching table may also
be required; "if" might be matched by a state that allowed either
LITERAL_if or TEXT as the type for that token; for a first
implementation, the type patching may not be necessary since tree
walkers could also do dynamic lookup when matching literals.  Patching,
though, seems preferable over the longer term.)

 

Comments, anyone?  As far as I can see, the only downside to providing
an option to select a keyword mechanism is that we will need to find a
replacement topic for ANTLR workshop discussions-I can remember being in
discussions on this topic at every one of the past workshops!

 

--Loring


* ************************************************************************ *

This transmission is intended for the named recipient only. It may contain private and confidential information. If this has come to you in error you must not act on anything disclosed in it, nor must you copy it, modify it, disseminate it in any way, or show it to anyone. Please e-mail the sender to inform us of the transmission error or telephone ECA International immediately and delete the e-mail from your information system.

Telephone numbers for ECA International offices are: Sydney +61 (0)2 8272 5300, Hong Kong + 852 2121 2388, London +44 (0)20 7351 5000 and New York +1 212 582 2333.

* ************************************************************************ *
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20060531/c177b725/attachment.html


More information about the antlr-interest mailing list