[antlr-interest] Sample scannerless parser with AST construction in unmodified ANTLR

Peter Kooiman peter at crispu.com
Sun Apr 17 03:35:44 PDT 2011


Ter, 

First of all, let me explain that the only reason I'm being such a nuisance is that I really want this to work! However, I'm afraid that in the end, ANTLR falls just short of being a scannerless tool. 

The problem lies with distinguishing between keywords, and identifiers that happen to start with the same letters as a keyword. 
The sample at http://bit.ly/gT3Q1C cannot distinguish between "returnme;" and "return me;", because kreturn is expressed as: 

kreturn : 'r' 'e' 't' 'u' 'r' 'n' ws? ; 

My first thought was, just make the whitespace not optional. But, in C for example, we can have 
return; 
return me; 

whereas "returnme;" would be a syntax error. Now, making ws not optional is no longer possible; what is really needed is a way to express 
"'r' 'e' 't' 'u' 'r' 'n' followed by anything that can NOT be part of an identifier". Although you could re-write the return statement rule to something awful like 

retstat: kreturn ws? colon 
         | kreturn ws id colon 
         ; 

the underlying problem remains: there is no way to prevent ANTLR entering rule kreturn upon seeing an identifier like "returnme" that happens to start with the same letters as keyword "return". In Rats!, you would write

KRETURN = "return" !LetterOrDigit ws? ;
 
where the "!" operator denotes a syntactic predicate meaning "LetterOrDigit must not match, and corresponding input will not be consumed"

Without the ability to express "something followed by anything that is not a letter or digit", I don't see how to get it right in ANTLR. I very much hope I am wrong though!



More information about the antlr-interest mailing list