[antlr-interest] lexer: compound keywords with a twist

Edwards, Waverly Waverly.Edwards at genesys.com
Mon Aug 20 06:56:02 PDT 2007


 
Actually Edit = numericVar is a case where Edit is also a reserved word,
it just performs another function but I starting to understand how this
works now.
"Compile Long If" is only one of roughly 60 three token keywords.  I
don't know how many two token keyword there are.
 
>It's easy enough to handle 'Compile' as identifier vs. 'Compile Long'
as keyword, but...
 
Fortunately for me,  'Compile Long' is not legal so I can throw some of
those cases out.
 
>>Possibly, but that seems more like a job for the parser.  At the
parser level you can examine the surrounding context and then emit an
EditStatement or >>EditFunction into the AST. 
 
I will study this all very carefully.
 
 
Thank you,
 
 
W.
 

________________________________

From: Gavin Lambert [mailto:antlr at mirality.co.nz] 
Sent: Monday, August 20, 2007 7:03 AM
To: Edwards, Waverly; antlr
Subject: Re: [antlr-interest] lexer: compound keywords with a twist


At 12:52 20/08/2007, Edwards, Waverly wrote:


	I'm a first time ANTLR user and I have some questions that I
need some assistanc with. 
	I am replicating an existing procedural BASIC dialect language
compiler.  I actually have 
	multiple issues to overcome but this is the first one.  The
language has *hundreds* of keywords. 
	Many of the keywords are actually compound keywords 
	
	Edit = numericVar 
	Edit Field 
	Edit Field Close 
	Edit Menu 
	Edit Text 
	Compile Long If 


For that case, my first cut attempt would be something along these lines
(not sure if it'll compile without warnings, but I think it's close):

EDIT_FIELD
  : 'Edit'
      (WS
        ('Field'
          (WS 'Close' { $type = EDIT_FIELD_CLOSE; }
          | /*nothing -- EDIT_FIELD*/
          )
        | 'Menu' { $type = EDIT_MENU; }
        | 'Text' { $type = EDIT_TEXT; }
        )
      | /* nothing */ { $type = IDENTIFIER; }
      )
  ;

(Where WS is defined to exclude newlines, unless your language supports
these multi-word keywords being broken across lines too.)

This is basically the "how you'd parse it by eye" approach.  (Though
iIt'll be more complicated if you want to be case-insensitive as
well...)

The last case I'm a little unsure about.  It's easy enough to handle
'Compile' as identifier vs. 'Compile Long' as keyword, but treating
'Compile Long If' as a keyword and 'Compile Long Foo' as three
identifiers would be tricky, and would probably require emitting
multiple tokens from a single lexer rule.  (It becomes easier again if
you can treat some of these cases as illegal.)



	2.  Is it possible to deal with variable length keywords at the
lexer level. 
	
	stringVar = Edit$( vNumParam ) 
	Edit$( vNumParam ) = stringVar 


Possibly, but that seems more like a job for the parser.  At the parser
level you can examine the surrounding context and then emit an
EditStatement or EditFunction into the AST. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070820/e37bf2fa/attachment.html 


More information about the antlr-interest mailing list