[antlr-interest] How does one handle variable number of function parameters?

Tue Nov 29 07:28:38 PST 2005

On 11/29/05, Rob Greene <robgreene at gmail.com> wrote:
> >    function_call : ID LPAREN^ args RPAREN!;
> >    args  :  expr ( COMMA! expr )* { ## = #( #[ARGLIST,"ARGLIST"], ## ); } ;
>
> 1) You've made the LPAREN the root - why? I figured I'd have the
> IDENTIFIER as the root with the expressions as the children. Probaby
> doesn't matter if I get the tree parser setup right...?

The LPAREN as the root helps to distinguish between an identifier and a function
call with no arguments.  You can probably set up the tree parser correctly, but
I've found it simpler to have different node types for different operations.

<oops>
    I see that your grammar requires at least one argument to a function call;
    I did the same thing, and didn't even notice it.  Was this intentional, so
    you reject things like "date()" as illegal?
</oops>

Generally, if I see lots of references to the type attribute, and to the
shape of the tree, I find it worth while to introduce a new node type.

> 2) What the heck does this mean? I've seen it mentioned in the ANTLR
> documentation, but I didn't grok it.
>     args  :  expr ( COMMA! expr )* { ## = #( #[ARGLIST,"ARGLIST"], ## ); } ;

(just love that word; "grok").

As Martin said, this introduces a wrapper node.  It wraps the expressions in
one node so that the shape of the tree is well-defined.  I find it much easier
to know that only certain nodes have unknown numbers of children, and having a
wrapper node makes this much easier.

Going back to the <oops> above, if functions can be called with zero or more
arguments, then the line-noise above can be changed to this:

    args :
        ( expr ( COMMA! expr )* )?
        { ## = #( #[ARGLIST,"ARGLIST"], ## ); }
        ;

    // perhaps this is easier to understand.  it does (i hope :-) the same
    // thing with an extra production.

    exprlist :
        ( expr ( COMMA! expr )* )?
        ;

    args :
        elist:exprlist
        { #args = #( #[ARGLIST,"ARGLIST"], #elist ); }
        ;

In either case, you are guaranteed to have an ARGLIST node for the
argument list, even if there
are no arguments.  If you need to print out the function call later, that also
gives you a production into which you can place the parens and comma:

    arglist:
        #( ARGLIST
            { cout << "("; }
            (
                expr
                (
                    // commas *between* expressions, not *after* expressions
                    { cout << ", "; }
                    expr
                )*
            )?
            { cout << ")"; }
        )
    ;

> 3) I'd like to have the parenthesis optional. Do I want to have two
> definitions to pick up the parenthesis in the parser grammar file?
>         | (IDENTIFIER LPARN) => IDENTIFIER^ LPAREN! expression (COMMA!
> expression)* RPAREN!
>         | IDENTIFIER^

I'm sorry, I don't grok the purpose of this...  This would be identical to the
original grammar you had with the argument list as optional.