[antlr-interest] Problems with spurious EOF tree nodes while working with a tree parser (update)

Stefan Mätje Stefan.Maetje at esd-electronics.com
Fri Apr 20 00:38:39 PDT 2012


Hi ANTLR community,

perhaps I should ask my questions from another point of view.

Is it possible in a filtering tree parser to match simple sequences of 
two or more tokens? How is this be done or may be it is impossible?

In my parser I have the following "statement" rule that parses common 
statements and generates the AST:

statement
	:  (ID ':')* unlabeledStatement
		-> (LBD_DCL ID)* unlabeledStatement
	;

A statement like "lbl1: lbl2: GOTO somewhere;" would generate an AST 
like this:

"LBL_DCL lbl1 LBL_DCL lbl2 'GOTO' somewhere"

with 'lbl1', 'lbl2' and 'somewhere' being ID nodes. Is it possible to 
match these sequences like I tried with the tree grammar (filter mode) 
rules below without interfering with this unexpected 'EOF' tree node?

Thanks for any help,
	Stefan

Am 19.04.2012 20:26, schrieb Stefan Mätje:
> Hi,
>
> generate an AST with a combined lexer/parser grammar. Then I feed the
> generated AST via a CommonTreeNodeStream into a tree grammar to build up
> a symbol table. The tree grammar is in filter mode and I use my
> own tree nodes called Pearl90Tree. Therefore I created a custom
> Pearl90TreeAdaptor class.
>
> In my tree grammar I have two very simple rules quoted below:
>
> label_dcl
> //	:	LBL_DCL ID		// Won't match
> 	:	LBL_DCL EOF ID		// Will match
> 	{
> 		dbgOut.println("->  Label at _",$ID.line, $ID.pos));
> 	}
> 	;
>
> label_resolve
> //	:	gt='GOTO' id=ID			// Won't match
> 	:	gt='GOTO' eof=EOF myId=ID	// Will match
> 	{
> 	dbgOut.println("GOTO_Label "+$myId.toString());
> 	dbgOut.println("EOF #"+$eof.serial);
> 	}
> 	;
>
> The parser generates simply "LBL_DCL ID" for each label definition and a
> sequence of "'GOTO' ID" for a goto statement. I verified that the AST is
> correct. Also I dumped the CommonTreeNodeStream to see that it doesn't
> contain any EOF tree node behind the 'GOTO' or LBL_DCL tree node. My
> test source input is this:
>
> MODULE goto;
> PROBLEM;
> P: PROC;
>     label: GOTO label;
> END;
> MODEND;
>
> The CommonTreeNodeStream dumped follows here:
>
> +++++++++++++++ Tree +++++++++++++++++++++
> Pearl90Tree node #-1, c:2; token type: 92 'MODULE', value: 'MODULE'
> Pearl90Tree node #-1, c:0; token type: 2 '<DOWN>', value: 'DOWN'
> Pearl90Tree node #0, c:0; token type: 73 'ID', value: 'goto'
> Pearl90Tree node #1, c:1; token type: 182 ''PROBLEM'', value: 'PROBLEM'
> Pearl90Tree node #-1, c:0; token type: 2 '<DOWN>', value: 'DOWN'
> Pearl90Tree node #0, c:3; token type: 115 'PROC_DCL', value: 'PROC_DCL'
> Pearl90Tree node #-1, c:0; token type: 2 '<DOWN>', value: 'DOWN'
> Pearl90Tree node #0, c:0; token type: 73 'ID', value: 'P'
> Pearl90Tree node #1, c:0; token type: 93 'MOD_LIST', value: 'MOD_LIST'
> Pearl90Tree node #2, c:4; token type: 27 'BODY', value: 'BODY'
> Pearl90Tree node #-1, c:0; token type: 2 '<DOWN>', value: 'DOWN'
> Pearl90Tree node #0, c:0; token type: 85 'LBL_DCL', value: 'LBL_DCL'
> Pearl90Tree node #1, c:0; token type: 73 'ID', value: 'label'
> Pearl90Tree node #2, c:0; token type: 77 'KW_GOTO', value: 'GOTO'
> Pearl90Tree node #3, c:0; token type: 73 'ID', value: 'label'
> Pearl90Tree node #-1, c:0; token type: 3 '<UP>', value: 'UP'
> Pearl90Tree node #-1, c:0; token type: 3 '<UP>', value: 'UP'
> Pearl90Tree node #-1, c:0; token type: 3 '<UP>', value: 'UP'
> Pearl90Tree node #-1, c:0; token type: 3 '<UP>', value: 'UP'
> Pearl90Tree node #-1, c:0; token type: -1 '>EOF<', value: 'EOF'
>
> What I can see is that the tree parser in the filter mode generates lots
> of UP, DOWN and EOF tree nodes. Apparently the tree parser stuffs some
> of these EOF nodes between the others. (I know this because the tree
> parser calls my Pearl90TreeAdaptor to generate these nodes.)
>
> Because of this added EOF it can only match to "'GOTO' EOF ID" which I
> believe to be very strange. Why doesn't it match the "LBL_DCL ID"
> sequence either?
>
> What am I doing wrong? Any suggestions to get this running without
> putting this "magic" EOF in between?
>
> Thanks in advance,
> 	Stefan
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list