[antlr-interest] Problems with spurious EOF tree nodes while working with a tree parser

Stefan Mätje Stefan.Maetje at esd-electronics.com
Thu Apr 19 11:26:26 PDT 2012


Hi,

generate an AST with a combined lexer/parser grammar. Then I feed the 
generated AST via a CommonTreeNodeStream into a tree grammar to build up 
a symbol table. The tree grammar is in filter mode and I use my
own tree nodes called Pearl90Tree. Therefore I created a custom
Pearl90TreeAdaptor class.

In my tree grammar I have two very simple rules quoted below:

label_dcl
//	:	LBL_DCL ID		// Won't match
	:	LBL_DCL EOF ID		// Will match
	{
		dbgOut.println("-> Label at _",$ID.line, $ID.pos));
	}
	;

label_resolve
//	:	gt='GOTO' id=ID			// Won't match
	:	gt='GOTO' eof=EOF myId=ID	// Will match
	{
	dbgOut.println("GOTO_Label "+$myId.toString());
	dbgOut.println("EOF #"+$eof.serial);
	}
	;

The parser generates simply "LBL_DCL ID" for each label definition and a 
sequence of "'GOTO' ID" for a goto statement. I verified that the AST is 
correct. Also I dumped the CommonTreeNodeStream to see that it doesn't 
contain any EOF tree node behind the 'GOTO' or LBL_DCL tree node. My 
test source input is this:

MODULE goto;
PROBLEM;
P: PROC;
   label: GOTO label;
END;
MODEND;

The CommonTreeNodeStream dumped follows here:

+++++++++++++++ Tree +++++++++++++++++++++
Pearl90Tree node #-1, c:2; token type: 92 'MODULE', value: 'MODULE'
Pearl90Tree node #-1, c:0; token type: 2 '<DOWN>', value: 'DOWN'
Pearl90Tree node #0, c:0; token type: 73 'ID', value: 'goto'
Pearl90Tree node #1, c:1; token type: 182 ''PROBLEM'', value: 'PROBLEM'
Pearl90Tree node #-1, c:0; token type: 2 '<DOWN>', value: 'DOWN'
Pearl90Tree node #0, c:3; token type: 115 'PROC_DCL', value: 'PROC_DCL'
Pearl90Tree node #-1, c:0; token type: 2 '<DOWN>', value: 'DOWN'
Pearl90Tree node #0, c:0; token type: 73 'ID', value: 'P'
Pearl90Tree node #1, c:0; token type: 93 'MOD_LIST', value: 'MOD_LIST'
Pearl90Tree node #2, c:4; token type: 27 'BODY', value: 'BODY'
Pearl90Tree node #-1, c:0; token type: 2 '<DOWN>', value: 'DOWN'
Pearl90Tree node #0, c:0; token type: 85 'LBL_DCL', value: 'LBL_DCL'
Pearl90Tree node #1, c:0; token type: 73 'ID', value: 'label'
Pearl90Tree node #2, c:0; token type: 77 'KW_GOTO', value: 'GOTO'
Pearl90Tree node #3, c:0; token type: 73 'ID', value: 'label'
Pearl90Tree node #-1, c:0; token type: 3 '<UP>', value: 'UP'
Pearl90Tree node #-1, c:0; token type: 3 '<UP>', value: 'UP'
Pearl90Tree node #-1, c:0; token type: 3 '<UP>', value: 'UP'
Pearl90Tree node #-1, c:0; token type: 3 '<UP>', value: 'UP'
Pearl90Tree node #-1, c:0; token type: -1 '>EOF<', value: 'EOF'

What I can see is that the tree parser in the filter mode generates lots 
of UP, DOWN and EOF tree nodes. Apparently the tree parser stuffs some 
of these EOF nodes between the others. (I know this because the tree 
parser calls my Pearl90TreeAdaptor to generate these nodes.)

Because of this added EOF it can only match to "'GOTO' EOF ID" which I 
believe to be very strange. Why doesn't it match the "LBL_DCL ID" 
sequence either?

What am I doing wrong? Any suggestions to get this running without 
putting this "magic" EOF in between?

Thanks in advance,
	Stefan





More information about the antlr-interest mailing list