[antlr-interest] Problems with spurious EOF tree nodes while working with a tree parser
Stefan Mätje
Stefan.Maetje at esd-electronics.com
Thu Apr 19 11:26:26 PDT 2012
Hi,
generate an AST with a combined lexer/parser grammar. Then I feed the
generated AST via a CommonTreeNodeStream into a tree grammar to build up
a symbol table. The tree grammar is in filter mode and I use my
own tree nodes called Pearl90Tree. Therefore I created a custom
Pearl90TreeAdaptor class.
In my tree grammar I have two very simple rules quoted below:
label_dcl
// : LBL_DCL ID // Won't match
: LBL_DCL EOF ID // Will match
{
dbgOut.println("-> Label at _",$ID.line, $ID.pos));
}
;
label_resolve
// : gt='GOTO' id=ID // Won't match
: gt='GOTO' eof=EOF myId=ID // Will match
{
dbgOut.println("GOTO_Label "+$myId.toString());
dbgOut.println("EOF #"+$eof.serial);
}
;
The parser generates simply "LBL_DCL ID" for each label definition and a
sequence of "'GOTO' ID" for a goto statement. I verified that the AST is
correct. Also I dumped the CommonTreeNodeStream to see that it doesn't
contain any EOF tree node behind the 'GOTO' or LBL_DCL tree node. My
test source input is this:
MODULE goto;
PROBLEM;
P: PROC;
label: GOTO label;
END;
MODEND;
The CommonTreeNodeStream dumped follows here:
+++++++++++++++ Tree +++++++++++++++++++++
Pearl90Tree node #-1, c:2; token type: 92 'MODULE', value: 'MODULE'
Pearl90Tree node #-1, c:0; token type: 2 '<DOWN>', value: 'DOWN'
Pearl90Tree node #0, c:0; token type: 73 'ID', value: 'goto'
Pearl90Tree node #1, c:1; token type: 182 ''PROBLEM'', value: 'PROBLEM'
Pearl90Tree node #-1, c:0; token type: 2 '<DOWN>', value: 'DOWN'
Pearl90Tree node #0, c:3; token type: 115 'PROC_DCL', value: 'PROC_DCL'
Pearl90Tree node #-1, c:0; token type: 2 '<DOWN>', value: 'DOWN'
Pearl90Tree node #0, c:0; token type: 73 'ID', value: 'P'
Pearl90Tree node #1, c:0; token type: 93 'MOD_LIST', value: 'MOD_LIST'
Pearl90Tree node #2, c:4; token type: 27 'BODY', value: 'BODY'
Pearl90Tree node #-1, c:0; token type: 2 '<DOWN>', value: 'DOWN'
Pearl90Tree node #0, c:0; token type: 85 'LBL_DCL', value: 'LBL_DCL'
Pearl90Tree node #1, c:0; token type: 73 'ID', value: 'label'
Pearl90Tree node #2, c:0; token type: 77 'KW_GOTO', value: 'GOTO'
Pearl90Tree node #3, c:0; token type: 73 'ID', value: 'label'
Pearl90Tree node #-1, c:0; token type: 3 '<UP>', value: 'UP'
Pearl90Tree node #-1, c:0; token type: 3 '<UP>', value: 'UP'
Pearl90Tree node #-1, c:0; token type: 3 '<UP>', value: 'UP'
Pearl90Tree node #-1, c:0; token type: 3 '<UP>', value: 'UP'
Pearl90Tree node #-1, c:0; token type: -1 '>EOF<', value: 'EOF'
What I can see is that the tree parser in the filter mode generates lots
of UP, DOWN and EOF tree nodes. Apparently the tree parser stuffs some
of these EOF nodes between the others. (I know this because the tree
parser calls my Pearl90TreeAdaptor to generate these nodes.)
Because of this added EOF it can only match to "'GOTO' EOF ID" which I
believe to be very strange. Why doesn't it match the "LBL_DCL ID"
sequence either?
What am I doing wrong? Any suggestions to get this running without
putting this "magic" EOF in between?
Thanks in advance,
Stefan
More information about the antlr-interest
mailing list