[antlr-interest] AST build with input tokens out of order

John B. Brodie jbb at acm.org
Sat Aug 20 07:39:51 PDT 2011


Greetings!

Sorry for jumping in...

On Sat, 2011-08-20 at 14:40 +0200, Robert Jarzmik wrote:
> Hi Bart,
> 
> Could I abuse a bit more of your time ?
> 
> My initial ordering problem came back because of the order of my rules.
> 
> I have this input : STRUCT myvar1, myvar2 ( INTEGER i1; INTEGER j1; )
> 
> The tree I'd like to have is :
>  #(DECL_VARIABLE 'myVar1' 
>    #(STRUCT
>      #(DECL_VARIABLE 'i1' INTEGER)
>      #(DECL_VARIABLE 'j1' INTEGER)
>     )
>    )
>  #(DECL_VARIABLE 'myVar2' 
>    #(STRUCT
>      #(DECL_VARIABLE 'i1' INTEGER)
>      #(DECL_VARIABLE 'j1' INTEGER)
>     )
>    )
> 
> I have a rule 'structure_members which rewrites '( INTEGER i1; INTEGER j1;
> )' into #(STRUCT #(DECL_VARIABLE 'i1' INTEGER) #(DECL_VARIABLE 'j1' INTEGER)).
> 
> My problem is the main declaration rule :
> declaration
>     : STRUCT identifiers[$structure_members.tree] structure_members
>       -> ^(DECL_VARIABLE identifiers+ structure_members)
> 
> The 'structure_members' are only available after identifiers are parsed, and I
> cannot give identifiers the parameter $structure_members.tree as I wished (I get
> a forward reference error).
> 
> Is there a way to pass the structure_members tree to identifiers rule ?

i do not think there is 

but you may lift the identifiers rule into the declaration rule so that
all of the elements (e.g. the list of identifiers) are available to the
rewrite meta-operation. so the identifiers rule is deleted and the
declaration rule becomes:

declaration
: type_identifier id+=IDENTIFIER (',' id+=IDENTIFIER)*
   -> ^(DECL_VARIABLE $id type_identifier)+
| STRUCT id+=IDENTIFIER (',' id+=IDENTIFIER)* '(' structure_members ')'
  -> ^(DECL_VARIABLE $id structure_members)+
;

see attached for a complete grammar including test driver that i used to
verify this.

> 
> Cheers.
> 
> --
> 
> Robert
> 
> PS: All the input data I'm using:
>   => INPUT = "INTEGER good; STRUCT myvar1, myvar2 ( INTEGER i1; INTEGER j1; )"
>   => The example grammar to demonstrate the issue
> /******************************************************************************
>  * LTR_ex4.g
>  ******************************************************************************/
> grammar LTR_ex4;
> 
> options {
>   k=1;
>   output=AST;
>   ASTLabelType=CommonTree;
> }
> 
> tokens {
>   DIMS; DECL_VARIABLE;
> }
> 
> translation_unit
> 	: declaration (';'! declaration)*
> 	;
>     
> declaration
> 	: type_identifier identifiers[$type_identifier.tree]
> 	-> ^(DECL_VARIABLE identifiers)+
> 	| STRUCT identifiers[$structure_members.tree] '(' structure_members ')'
> 	-> ^(DECL_VARIABLE identifiers)
> 	;
> 
> structure_members
> 	: (declaration ';')+
> 	-> ^(STRUCT declaration+)
> 	;
>     
> identifiers[CommonTree type]
> 	: IDENTIFIER (',' IDENTIFIER)*
> 	-> ({type} IDENTIFIER)+
> 	;
> 
> type_identifier
>     	:	 'INTEGER'
> 	;
> 
> STRUCT 	:	'STRUCT' ;
> IDENTIFIER  :('a'..'z' | '0'..'9')+ ;
> WS  :   ( ' ' | '\t'| '\r'| '\n') {$channel=HIDDEN;};
> 

-------------- next part --------------
grammar Test;

options {
   output = AST;
   ASTLabelType = CommonTree;
}

tokens {
  DIMS; DECL_VARIABLE;
}

@members {

   // test data - each string in the following array is parsed separately
   private static final String [] x = new String[] {
      "INTEGER v1",
      "INTEGER v1; INTEGER v2",
      "INTEGER v1, v2",
      "STRUCT myvar1 ( INTEGER i1; INTEGER j1; )",
      "STRUCT myvar2 ( INTEGER i1, j1; )",
      "INTEGER v1, v2; STRUCT myvar1, myvar2 ( INTEGER i1; INTEGER j1; )",
      "STRUCT x (STRUCT y (STRUCT z (INTEGER i,j;););)"
   };

   public static void main(String [] args) {
      for( int i = 0; i < x.length; ++i ) {
         try {
            System.out.println("about to parse:`"+x[i]+"`");

            TestLexer lexer = new TestLexer(new ANTLRStringStream(x[i]));
            CommonTokenStream tokens = new CommonTokenStream(lexer);

            // System.out.format("dump of the token stream:\%n");
            // int j = 0;
            // boolean looping = true;
            // while( looping ) {
            //    Token token = lexer.nextToken();
            //    int typ = token.getType();
            //    System.out.format("\%d: type = \%s, text = `\%s`\%s\%n",
            //                      j++,
            //                      typ==EOF?"EOF":tokenNames[typ],
            //                      token.getText(),
            //                      token.getChannel()==HIDDEN?" (HIDDEN)":"");
            //    looping = typ != EOF;
            // }
            // lexer.reset();
            // System.out.format("now performing the parse\n");

            TestParser parser = new TestParser(tokens);
            TestParser.test_return p_result = parser.test();

            CommonTree ast = p_result.tree;
            if( ast == null ) {
               System.out.println("resultant tree: is NULL");
            } else {
               System.out.println("resultant tree: " + ast.toStringTree());
            }
            System.out.println();
         } catch(Exception e) {
            e.printStackTrace();
         }
      }
   }
}

test : translation_unit EOF! ;

translation_unit
        : declaration (';'! declaration)*
        ;
    
declaration
        : type_identifier id+=IDENTIFIER (',' id+=IDENTIFIER)*
        -> ^(DECL_VARIABLE $id type_identifier)+
        | STRUCT id+=IDENTIFIER (',' id+=IDENTIFIER)* '(' structure_members ')'
        -> ^(DECL_VARIABLE $id structure_members)+
        ;

structure_members
        : (declaration ';')+
        -> ^(STRUCT declaration+)
        ;
    
type_identifier
        :        'INTEGER'
        ;

STRUCT  :       'STRUCT' ;
IDENTIFIER  :('a'..'z' | '0'..'9')+ ;
WS  :   ( ' ' | '\t'| '\r'| '\n') {$channel=HIDDEN;};


More information about the antlr-interest mailing list