[antlr-interest] Newbie question on symbol tables and type validation

Aditya Damodaran adityad at gmail.com
Sun Jan 16 21:01:02 PST 2011


First of all : Hello world! I'm a complete antlr newbie that is attempting
to navigate this all powerful tool (TParr you are my hero!) and create a
simple language which has assignments and expressions. The references are to
external data objects in a schema and I need to make sure the types work out
correctly and the scopes are multiplied through...e.g.


c.d= x.y + sum(a.b{c*d.e})*g

will become:

c.d=x.y + sum(a.b.c * a.b.d.e)*g


I also need to make sure that all these references are valid and the types
for all of these expressions work out correctly (e.g. am not assigning a
String to a float etc.). Using the 2 books and copious amounts of JavaDude
(thanks to Scott S as well!) I have blunderingly reached the below grammar
which is nice.


What I'm confused about completely is how I can return types from the
expression matching rules and still have the AST so I can eventually
evaluate it. I know this is a whale of a question but any pointers would be
super useful.

To summarize:


1)

How do I turn the following tree:

(ASSIGNMENT (REFERENCE x) (FUNC sum (SCOPE (REFERENCE a b) (* (REFERENCE c)
(REFERENCE d)))))


Into

(ASSIGNMENT (REFERENCE x) (FUNC sum (* (REFERENCE a b c) (REFERENCE a b d)))


using tree grammars or some kind of AST expressions.


2)Also how do I make sure that a.b.c is a valid symbol and the types of the
expressions match up? I have collections of all valid symbols and their
types etc. I can obviously take the tree and crete a copy with my own types
and then do multiple runs across it but I feel like I'm missing some basic
info which would speed up the task tremendously.


3)How do I at rule "term" below make sure that any subrule which matches
qualifiedid is replaced (after validation) by the concatenation of the scope
being stored globally and the tokens in the list for qualified ids?


Please help if you can...


Thanks,

Adi


Newbie grammar below:




grammar Rules;

options{

  language=Java;

  output=AST;

  ASTLabelType=CommonTree;

}


tokens{

PLUS='+';

MINUS='-';

MULTOP='*';

DIV='/';

POW='^';

NOT='!';

GT='>';

LT='<';

GE='>=';

LE='<=';

NE='!=';

EQ='==';

ASSIGN='=';

PARENOPEN='(';

PARENCLOSE=')';

AND='and';

OR='or';

SCOPEOPEN='{';

SCOPECLOSE='}';

DEREF='.';

COMMA=',';

IF='if';

SCOPE;

FUNC;

REFERENCE;

ASSIGNMENT;

}

@header {

  package rules;

}


@lexer::header {

  package rules;

}




rule:  (statement)+ EOF!;


vardef:

      qualifiedid ASSIGN^ expression

;


qualifiedid:

      j+=ID ((DEREF j+=ID)*) -> ^(REFERENCE $j+)

;


statement:

  vardef;



evaledif:

  IF^ PARENOPEN expression COMMA expression COMMA expression PARENCLOSE;


scopedexpression:

  qualifiedid so=SCOPEOPEN expression SCOPECLOSE -> ^(SCOPE qualifiedid
expression);


funccall:

      ID po=PARENOPEN exprlist PARENCLOSE -> ^(FUNC ID exprlist)

;


exprlist:

 (expression (COMMA (expression))*)?

;


term:

       qualifiedid

     | PARENOPEN expression PARENCLOSE -> expression

     | INT

     | FLOAT

     | STRING

     | funccall

     | scopedexpression

     | evaledif

;


//operator precedence

negation: NOT^* term;

unary:  (PLUS!|MINUS^)* negation;

power  : unary (POW^ unary)*;

mult: power ((MULTOP^|DIV^) power)*;

add : mult ((PLUS^|MINUS^) mult)*;

compare: add ((GT^|LT^|EQ^|GE^|LE^|NE^) add)*;

expression: compare ((AND^ | OR^) compare)*;



ID  : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*

    ;



INT : '0'..'9'+

    ;


FLOAT

    :   ('0'..'9')+ '.' ('0'..'9')* EXPONENT?

    |   '.' ('0'..'9')+ EXPONENT?

    |   ('0'..'9')+ EXPONENT

    ;


COMMENT

    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}

    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}

    ;


WS  :   ( ' '

        | '\t'

        | '\r'

        | '\n'

        ) {$channel=HIDDEN;}

    ;


STRING

    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'

    ;


fragment

EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;


fragment

HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;


fragment

ESC_SEQ

    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')

    |   UNICODE_ESC

    |   OCTAL_ESC

    ;


fragment

OCTAL_ESC

    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')

    |   '\\' ('0'..'7') ('0'..'7')

    |   '\\' ('0'..'7')

    ;


fragment

UNICODE_ESC

    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT

    ;


More information about the antlr-interest mailing list