[antlr-interest] Using C++ types in an ANTLR-generated C parser

Christopher L Conway cconway at cs.nyu.edu
Wed Feb 24 07:50:59 PST 2010


I'm trying to use an ANTLR v3.2-generated parser in a C++ project
using C as the output language, compiling the output as C++. I'm
having trouble dealing with C++ types inside parser actions. Here's a
C++ header file defining a few types I'd like to use in the parser:

    /* expr.h */
    enum Kind {
      PLUS,
      MINUS
    };

    class Expr { // stub
    };

    class ExprFactory {
    public:
      Expr mkExpr(Kind kind, Expr op1, Expr op2);
      Expr mkInt(std::string n);
    };

And here's a simple parser definition:

    /* Expr.g */
    grammar Expr;

    options {
      language = 'C';
    }

    @parser::includes {
      #include "expr.h"
    }

    @members {
      ExprFactory *exprFactory;
    }

    start returns [Expr expr]
      : e = expression EOF { $expr = e; }
      ;

    expression returns [Expr e]
      : TOK_LPAREN k=builtinOp op1=expression op2=expression TOK_RPAREN
        { e = exprFactory->mkExpr(k,op1,op2); }
      | INTEGER { e = exprFactory->mkInt((char*)$INTEGER.text->chars); }
      ;

    builtinOp returns [Kind kind]
      : TOK_PLUS { kind = PLUS; }
      | TOK_MINUS { kind = MINUS; }
      ;

    TOK_PLUS : '+';
    TOK_MINUS : '-';
    TOK_LPAREN : '(';
    TOK_RPAREN : ')';
    INTEGER : ('0'..'9')+;

The grammar runs through ANTLR just fine. When I try to compile
ExprParser.c, I get errors like

 1. `conversion from ‘long int’ to non-scalar type ‘Expr’ requested`
 2. `no match for ‘operator=’ in ‘e = 0l’`
 3. `invalid conversion from ‘long int’ to ‘Kind’`

In each case, the statement is an initialization of an `Expr` or
`Kind` value to `NULL`.

I can make the problem go away for the `Expr`'s by changing everything
to `Expr*`. This is workable, though hardly ideal. But passing around
pointers for a simple enum like `Kind` seems ridiculous. One ugly
workaround I've found is to create a second return value, which pushes
the `Kind` value into a struct and suppresses the initialization to
`NULL`. I.e, `builtinOp` becomes

    builtinOp returns [Kind kind, bool dummy]
      : TOK_PLUS { $kind = PLUS; }
      | TOK_MINUS { $kind = MINUS; }
      ;

and the first `expression` alternative becomes

    TOK_LPAREN k=builtinOp op1=expression op2=expression TOK_RPAREN
        { e = exprFactory->mkExpr(k.kind,*op1,*op2); }

There has to be a better way to do things? Am I missing a
configuration option to the C language backend? Is there another way
to arrange my grammar to avoid this awkwardness? Is there a pure C++
backend I can use?

Thanks,
Chris


More information about the antlr-interest mailing list