[antlr-interest] lexer: compound keywords with a twist

Mon Aug 20 07:57:56 PDT 2007

It sounds like you are falling in to the trap of trying to do too much
with the lexer. Valid combinations of keywords should be specified in
the parser rules basically, and the lexer should just pick out words
from the input stream and tokenize them. This includes making '$' a
token and specifying that the edit keyword followed by $ is a valid
atomic expression in the 'last' rule of your expression handler. You can
crib expression handling from say the Java or X example grammars (pick
whichever language seems closest to expressions in your language and
just cull out the bits you don't need/support.

It looks like you may need case insensitivity, so override the LA
function in the input (as per wiki article) to make it return UPPER CASE
only  (the lexer will then match any case for tokens but return the
original text for the token if asked.

So, this will give you something like:

statement

                : assignment

                | edit_stuff

                ;

assignment

                : var_keyw OPEQ expr

                ;

edit_stuff

                : EDIT FIELD

                                statsforthat

                  EDIT FIELD CLOSE

                | EDIT MENU ... etc

                ;

var_keyw

                : ID

                | EDIT

                ;

expr       : ....

atom

                : func_keyw DOLLAR LPAREN expr_list* RPAREN

                | ...

EDIT       : 'EDIT' ;

FIELD     : 'FIELD' ;

TEXT      : 'TEXT' ;

DOLLAS                : '$' ;

Etc.

In other words, if it isn't obvious how to deal with things in the
lexer, then you are likely trying to build a context sensitive lexer,
which USUALLY means that you need to construct the rules in the parser.

The parser will USUALLY not try to verify things semantically, so for
instance it might accept anything that looks like a function reference,
ten when it has said that this is valid syntactically, it may pass on a
tree to a tree walker that works out if the function has been declared,
is a valid intrinsic and so on and reject things that are semantically
invalid even though they look fine syntactically.

Hope that helps.

Jim

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Edwards, Waverly
Sent: Sunday, August 19, 2007 5:52 PM
To: antlr
Subject: [antlr-interest] lexer: compound keywords with a twist

Hello, 

I'm a first time ANTLR user and I have some questions that I need some
assistanc with. 
I am replicating an existing procedural BASIC dialect language compiler.
I actually have 
multiple issues to overcome but this is the first one.  The language has
*hundreds* of keywords. 
Many of the keywords are actually compound keywords 

Edit = numericVar 
Edit Field 
Edit Field Close 
Edit Menu 
Edit Text 
Compile Long If 

1.  How would you define compound keywords?  The keyword "Edit" has 5
different starts 
and the longest usage is a combination of three words 

2.  Is it possible to deal with variable length keywords at the lexer
level. 

stringVar = Edit$( vNumParam ) 
Edit$( vNumParam ) = stringVar 

I have two instances where Edit$ ( different from Edit ) can be used.
One as a function 
and the other a procedure.  If Edit$ is on the left side of a variable
it's a procedure and 
if on the right it's a function.  The perform different tasks.  Is there
a way at the lexer 
level that I can pass the parser something like EditStatement vs
EditFunction depending 
on what the lexer sees first. 

Thank you, 

W. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070820/a13f277a/attachment-0001.html