[antlr-interest] Lexer lookahead problems

Bharath Sundararaman Bharath.Sundararaman at starthis.com
Wed Apr 6 06:16:40 PDT 2005


1) If EQ can be a keyword, define it in the tokens section: tokens{ EQ = "EQ";} and then have the IDENT rule as

IDENT options {testLiterals=true;}
: ('a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '$')*;

Finally, you could have a parser rule of form 
assign: IDENT EQ expression; 


2) If EQ has to be a lexer rule, you could do this..

protected EQ: "EQ"; // To avoid clash with IDENT
IDENT: 
(("EQ"){$setType(EQ);} 
| ('a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '$')*; 

NOTE: I used the above rule just to give you an example. With this rule, whenever "EQ" is matched, the type will automatically be set to EQ, which means you cannot have an IDENTIFIER like "EQU".

However, if EQ will always be followed by, let's say, the rule "HASH", though it doesn't make any sense, You could do this:

(("EQ") (HASH))=>"EQ"{$setType(EQ);}

In this case, only when "EQ" is followed by HASH, it will not be an IDENTIFIER.
 
Hope this helps.

Bharath.


________________________________________
From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Peter Kronenberg
Sent: Wednesday, April 06, 2005 7:39 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Lexer lookahead problems

I'm trying to write a lexer for expressions which accepts both character and symbol comparisons, e.g., =, >, <, as well as EQ, GT, LT.
But in the lexer reports nondeterminism between EQ and my IDENT rule, naturally.  Increasing the lookahead doesn't seem to help.  Is there a way to fix this?
        EQ : "=" | "EQ"; 
      IDENT: ('a'..'z' | 'A'..'Z') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '$')*; 

Peter Kronenberg 
Software Engineer 
(703) 885-1222 
pkronenberg at technicacorp.com 


More information about the antlr-interest mailing list