[antlr-interest] Sample scannerless parser with AST construction in unmodified ANTLR
Peter Kooiman
peter at crispu.com
Sun Apr 17 03:35:44 PDT 2011
Ter,
First of all, let me explain that the only reason I'm being such a nuisance is that I really want this to work! However, I'm afraid that in the end, ANTLR falls just short of being a scannerless tool.
The problem lies with distinguishing between keywords, and identifiers that happen to start with the same letters as a keyword.
The sample at http://bit.ly/gT3Q1C cannot distinguish between "returnme;" and "return me;", because kreturn is expressed as:
kreturn : 'r' 'e' 't' 'u' 'r' 'n' ws? ;
My first thought was, just make the whitespace not optional. But, in C for example, we can have
return;
return me;
whereas "returnme;" would be a syntax error. Now, making ws not optional is no longer possible; what is really needed is a way to express
"'r' 'e' 't' 'u' 'r' 'n' followed by anything that can NOT be part of an identifier". Although you could re-write the return statement rule to something awful like
retstat: kreturn ws? colon
| kreturn ws id colon
;
the underlying problem remains: there is no way to prevent ANTLR entering rule kreturn upon seeing an identifier like "returnme" that happens to start with the same letters as keyword "return". In Rats!, you would write
KRETURN = "return" !LetterOrDigit ws? ;
where the "!" operator denotes a syntactic predicate meaning "LetterOrDigit must not match, and corresponding input will not be consumed"
Without the ability to express "something followed by anything that is not a letter or digit", I don't see how to get it right in ANTLR. I very much hope I am wrong though!
More information about the antlr-interest
mailing list