[antlr-interest] Creating a simple expression language

Sun Nov 23 13:20:48 PST 2008

2008/11/18 James Abley <james.abley at gmail.com>:
> Hi all,
>
> Sorry, I'm still stuck with this. I have taken the sample grammar from
> the book and reduced it as small as I know how currently.
>
>
> grammar Expr;
>
> options {
>        output=AST;
>        ASTLabelType=CommonTree;
>        k = 1;
> }
>
> prog    :       (stat {System.out.println($stat.tree.toStringTree());} )+ ;
>
> stat    :       expr    -> expr
>        ;
>
> expr    :       multExpr (('+' ^  | '-'^) multExpr)*
>        ;
>
> multExpr        :       atom ('*'^ atom)*
>        ;
>
> atom    :       INT
>        |       function
>        /* We don't want nodes for the parentheses */
>        |       '('! expr ')'!
>        ;
>
> /* We don't want nodes for the paretheses or commas. */
> function        :       FUNCNAME^ (WS)* '('! (WS)* (expr ( (WS)* ','! (WS)* expr )* )? ')'! ;
>
> FUNCNAME        :       ( 'a' .. 'z' | 'A' .. 'Z')+  ;
> INT     :       '0' .. '9'+ ;
> WS      :       ( ' ' | '\t' | '\n' | '\r' ) + {skip(); };
>
>
> I then have a tree grammar to go with this AST grammar.
>
>
>
> tree grammar Eval;
>
> options {
>        tokenVocab=Expr;        // read the token types from Expr.tokens
>        ASTLabelType=CommonTree;
> }
>
> prog    :       stat+ ;
>
> stat    :       expr
>                { System.out.println($expr.value); }
>        ;
>
> expr returns [int value]
>        :       ^('+' a=expr b=expr) { $value = a+b; }
>        |       ^('-' a=expr b=expr) { $value = a-b; }
>        |       ^('*' a=expr b=expr) { $value = a*b; }
>        |       ^( FUNCNAME expr*) { /* What do I do here? */ }
>        |       INT     {$value = Integer.parseInt($INT.text); }
>        ;
>
>
> I'm not sure how to plugin in the function evaluation using just the
> tree grammar.
>
>    @Test
>    public void simpleAdditionDefinedInGrammar() throws Exception {
>        ExprLexer lexer = new ExprLexer(new ANTLRStringStream("1+15+23-3"));
>        CommonTokenStream tokens = new CommonTokenStream(lexer);
>
>        ExprParser parser = new ExprParser(tokens);
>        ExprParser.expr_return r = parser.expr();
>
>        CommonTree tree = (CommonTree) r.getTree();
>
>        CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
>
>        Eval eval = new Eval(nodes);
>
>        assertEquals(36, eval.expr());
>    }
>
> Evaluation using the rules defined in the grammar works fine, as per
> my test above. I need to have an environment to allow function binding
> and dispatch so that I can have a simple Java API for people to
> implement functions against, without having to know ANTLR and alter
> the grammar.
>
> Am I approaching this problem in the right way?
>
> I have been able to walk the CommonTree that I get back from the
> parser, but I'm not sure I'm doing that in the right way either. Is
> there an existing ANTLR API to walk / visit trees?
>
> [1] JUnit TestEval - http://pastie.org/317730 - the second test fails
> since my tree grammar currently has no way to do any sort of lookup to
> a function.
> [2] Function - http://pastie.org/317734
> [3] AddFunction - http://pastie.org/317733
> [4] Environment - http://pastie.org/317736
> [5] TreeEvaluator - http://pastie.org/317737
>
> Cheers,
>
> James
>>>
>>> James Abley wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm an ANTLR newbie. A code base that I work on has various expression
>>>> evaluation aspects. I have to add to this by defining various
>>>> functions that can be evaluated. ANTLR seemed like a good way of
>>>> separating out the parsing aspects and should let my colleagues
>>>> concentrate on just defining and plugging in new functions without
>>>> having to know much about parsing, etc. I've skimmed the ANTLR
>>>> Reference book, but don't quite have the time to go in depth at this
>>>> point.
>>>>
>>>> I've written a grammar, which seems to do what I need. Doubtless it
>>>> could be trimmed a bit as I learn more. Where I'm stuck is the
>>>> connection between having a grammar which can parse the input and how
>>>> it gets evaluated.
>>>>
>>>> The baggage that I'm struggling with is how to define my environment,
>>>> bind variables, create stack frames, etc.
>>>>
>>>> I think this would be as part of a tree grammar the re-uses the tokens
>>>> from the AST grammar, but would like to confirm.
>>>>
>>>> Cheers,
>>>>
>>>> James
>>>>
>>>>
>>>>
>>>> grammar Eval;
>>>>
>>>> options {
>>>>        output = AST;
>>>> //      tokenVocab=Expr; // Read token types from Expr.tokens resource
>>>> //      ASTLabelType=CommonTree;    // The Java type of the nodes.
>>>> }
>>>>
>>>> tokens {
>>>>        FUNC;   // function call
>>>>        STR;
>>>> }
>>>>
>>>> @parser::header {
>>>> package com.example.expression;
>>>> }
>>>>
>>>> @lexer::header {
>>>> package com.example.expression;
>>>> }
>>>>
>>>> stat    :       expr+;
>>>>
>>>> /*
>>>> For now, we define expr very basically. We don't need to support
>>>> addition, multiplication or other operators. But if we
>>>> do, the grammar is easy to alter.
>>>> */
>>>> expr    :       atom
>>>>        ;
>>>> //multExpr ( ( '+' | '-') multExpr)*;
>>>>
>>>> //multExpr
>>>> //      :       unaryExpr (( '*' | '/') unaryExpr)*;
>>>>
>>>> //unaryExpr
>>>> //      :       ('+' | '-')?  atom
>>>> //      ;
>>>>
>>>> /* Basic constituent of an expression.*/
>>>> atom    :       var
>>>>        |       LPAREN expr RPAREN      // Rule to allow nested
>>>> expressions.
>>>>        |       functionCall
>>>>        |       stringLiteral
>>>>        |       number
>>>>        ;
>>>>
>>>> functionCall
>>>>        :       functionName LPAREN ( expr (COMMA expr)* )? RPAREN      ->
>>>> ^(FUNC
>>>> functionName expr*)
>>>>        ;
>>>>
>>>> functionName
>>>>        :       ALPHA (ALPHA | '-' | '_' | DIGIT )* ;
>>>> /*
>>>> Added to indicate how we currently reference bound variables in
>>>> expressions.. This lets us parse them easily enough.
>>>> with a view to consolidating our expression evaluation code into this
>>>> ANTLR-based version.
>>>> */
>>>> var     :       '$' ALPHA (ALPHA | '-' | '_' | DIGIT)*
>>>>        ;
>>>>
>>>> stringLiteral   :       '"'  ~'"'* '"'
>>>>        |       '\'' ~'\''* '\''
>>>>        ;
>>>>
>>>> number  :       DIGIT+ ('.' DIGIT+)?
>>>>        ;
>>>>
>>>> DIGIT
>>>>        :       '0' .. '9';
>>>>
>>>> ALPHA
>>>>        :        'a' .. 'z'
>>>>        |        'A' .. 'Z';
>>>>
>>>> COMMA
>>>>        :       (WS* ',' WS*);
>>>>
>>>> LPAREN
>>>>        :       (WS*  '(' WS*);
>>>> RPAREN
>>>>        :       (WS* ')' WS*);
>>>>
>>>> WS
>>>>        :       ' '
>>>>        |       '\t';
>>>>

Hi,

Any suggestions on this (including further books to read?). I think I
have a conceptual block to break from being too tightly wedded to my
ideas of how to implement this using hand-written parser; in
particular my experiences with things like Java XPath libraries. I
just need a hand getting through that block.

Cheers,

James