[antlr-interest] Sample scannerless parser with AST construction in unmodified ANTLR

Jason Doege jason.doege at doege.com
Sun Apr 17 08:56:47 PDT 2011


I mentioned this idea in a prior message, calling it a "zero-width 
negative look-ahead assertion", mainly because I am unfamiliar with the 
term "syntactic predicate". I didn't say much about implementation, though.

Two suggestions for this would be to allow ANTLR to:

1) treat:

kreturn : 'return' ws? ;

as if it were:

kreturn : 'r' 'e' 't' 'u' 'r' 'n' ws? ;

2) permit the use of regular expressions instead of simple strings in 
such instances which would then make that production look like:

kreturn : 'return\s?' ;

and with the addition of dealing with the keyword problem:

kreturn : 'return(?!\W)' ;

where (?!\W) looks at but does not consume the next character on the 
stream and the test passes only if the character is a non-word ( \W 
matches anything that is not a-z, A-Z, 0-9 or _ ).

I apologize for my ignorance of ANTLR architecture. I have no idea if or 
how the architecture of ANTLR would support this.

On a related note, it looks to me that when you have something like:

kreturn : 'return' ws?;

that the lexer is automatically created with a lexical element 'return' 
whereas my initial expectation is that 'return' would only be tested for 
existence at that particular point in the parsing process. I think 
limiting the parser to only considering the rule at hand is crucial to 
obtaining the flexibility implied by a scanner-less parser generator.

Best regards,
Jason Doege

On 4/17/2011 5:35 AM, Peter Kooiman wrote:
> Ter,
>
> First of all, let me explain that the only reason I'm being such a nuisance is that I really want this to work! However, I'm afraid that in the end, ANTLR falls just short of being a scannerless tool.
>
> The problem lies with distinguishing between keywords, and identifiers that happen to start with the same letters as a keyword.
> The sample at http://bit.ly/gT3Q1C cannot distinguish between "returnme;" and "return me;", because kreturn is expressed as:
>
> kreturn : 'r' 'e' 't' 'u' 'r' 'n' ws? ;
>
> My first thought was, just make the whitespace not optional. But, in C for example, we can have
> return;
> return me;
>
> whereas "returnme;" would be a syntax error. Now, making ws not optional is no longer possible; what is really needed is a way to express
> "'r' 'e' 't' 'u' 'r' 'n' followed by anything that can NOT be part of an identifier". Although you could re-write the return statement rule to something awful like
>
> retstat: kreturn ws? colon
>           | kreturn ws id colon
>           ;
>
> the underlying problem remains: there is no way to prevent ANTLR entering rule kreturn upon seeing an identifier like "returnme" that happens to start with the same letters as keyword "return". In Rats!, you would write
>
> KRETURN = "return" !LetterOrDigit ws? ;
>
> where the "!" operator denotes a syntactic predicate meaning "LetterOrDigit must not match, and corresponding input will not be consumed"
>
> Without the ability to express "something followed by anything that is not a letter or digit", I don't see how to get it right in ANTLR. I very much hope I am wrong though!
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address



More information about the antlr-interest mailing list