[antlr-interest] Struggling with recursion error

Sat Dec 12 10:32:44 PST 2009

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of André Næss
> Sent: Saturday, December 12, 2009 8:23 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Struggling with recursion error
> 
> Hi
> 
> I've been trying for quite some time now to put together a minimal toy
> parser, but I'm completely stuck on this recursion error that I don't
> understand.
> 
> The parser I'm trying to write is a simple parser for a language that
> has only let-bindings, if-then-else, integers, variables, and function
> application. The problem is the latter. A function application
> typically looks like this:
> 
>    foo a b 10
> 
> And I need this to parse as (foo a b 10), not (foo (a (b 10))) (I want
> to be able to pass functions as values).
> 
> After some back and forth I figured that by making the topmost expr
> match either a function application or any other expression, where
> function applications all begin with ID, I should get the correct
> parse. And indeed, I do, with backtrack on. If I turn off backtrack,
> the whole thing fails with

Yes - this isn't the way to do it, you need to combine your ID and function rules in some way. I think that what you are lacking is understanding of LL(n) parsing and how the ANTLR rule layout relates to this. The things in your top level expression have the lowest precedence and the things down at the end of the chain (your atoms) have the highest. When seeing an ID your current rules will pick the higher precedence, which is your atom. So, you need to left factor those as they are ambiguous. 

However a bigger problem is that you have a completely ambiguous language, which would only work if this was a runtime interpreted script as without parenthesis to delimit the function arguments, you cannot know what is an ID and what is a function. So, we can make it parse as foo a b 10, but once you see ID, you will consume everything else as arguments to your function up to the ';'. Are you trying to design a language or write something to parse an existing language? The nearest I can see to something like this is VBScript, which was horrendous to write a parser for and the only way to do it was using global backtracking. However, even VBScript is not this ambiguous. 

Designing a language is a difficult thing and you will need a lot more experience before being able to do this well. However, as there are so many horrible languages out there already now, then why not one that is completely logically invalid ;-)

So, if we rework your example then you will find you get:

[10:21:11] error(211): T.g:25:12: [fatal] rule atomExpr has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2.  Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
[10:21:11] error(211): T.g:25:8: [fatal] rule atomExpr has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2.  Resolve by left-factoring or using syntactic predicates or using backtrack=true option.

Because there is no way to decide between continuing to consume arguments or ending the list of them. Generally the dynamic languages that think it is good to lose things like ',' and '(' ')' just create huge ambiguities. I have never really understood why people think it such a pain to type ';' at the end of the statement - productivity gained through compiler errors and warnings far outweighs any 'pain' of typing ';'.

So, we can make your grammar work if we delimit the function arguments with '(' ')' and use a single token predicate. However, unless you are trying to parse something that already exists and have no choice, then I would abandon this avenue and first work your way through the book and use the examples downloaded from the download page to see how things are solved for languages you know such as Java or C etc. I would also recommend that you don't use 'literals' in your parser but create lexer rules as the literals can lead you astray when learning and are a pain to deal with when processing error messages.

grammar T;

options {
  output=AST;
}

tokens {
FUNC_APPL;
IFELSE;
LET;
VAR_REF;
}

program
 	:	expr (';' expr)*
 	;

expr    :	ifElseExpr
 	|	letExpr
 	|	atomExpr
 	;

atomExpr 
 	:	INT
 	|	ID (    ('(')=>'(' expr* ')' 
 	               -> ^(FUNC_APPL ID expr*)
 	             | -> ^(VAR_REF ID)
 	           )
 	|	'('! expr^ ')'!
 	;

letExpr
 	:	'let' letBinding+ (',' letBinding)* 'in' expr -> ^(LET letBinding+ expr)
 	;

letBinding
 	:	(ID^ '='! expr)
 	;

ifElseExpr
 	:	'if' cond=expr 'then' thenPart=expr elsePart='else' expr 

 		-> ^(IFELSE $cond $thenPart $elsePart)
 	;

INT : ('0'..'9')+ ;
ID : ('a'..'z'|'A'..'Z')+ ;

Jim