[antlr-interest] Two common tree parser errors

mzukowski at yci.com mzukowski at yci.com
Wed Apr 9 10:09:33 PDT 2003


Anthony Youngman asked me to help debug his tree parser.

First I invoked antlr.Tool to instrument for tracing the tree parser:

java antlr.Tool -traceTreeParser BASIC.g

I run it on the sample input.  Anthony prints the AST before tree parsing:

 ( PROGRAM SHELL ( LOOP ( PRINT "X" : ) ( INPUT COMMAND ) ( PRINT ( :
"Command is " COMMAND ) ) ( IF ( EQ COMMAND "Q" ) THEN EXIT ) ( PRINT JUNK )
REPEAT ) ( PRINT JUNK2 ) END ) null

 > programst(PROGRAM)
  > statement(LOOP)
   > loopst(LOOP)
<AST>: unexpected AST node: PRINT

Hmm, lets look at the rules:
loopst 
	: #("LOOP" (statement)+ "REPEAT" ;

statement : ( inputst | printst | exitst | ifst | loopst | NULL ) ;

printst { boolean colon = false; }
	: #(PRINT expr ( COLONPRINT {colon = true;})?)
	...
Looks ok, but it's not matching properly.  Loop calls statement calls
printst, so what's the deal?  Could the token types be wrong between parser
and tree parser?  Unlikely since they are in the same file, but let's
inspect the BASICTokenTypes.txt file anyways:

LITERAL_PRINT="PRINT"=37
...
PRINT=41

Aha.  Literal tokens get a prefix prepended to their name.  ANTLR thought
printst was introducing a new token type named PRINT and created a new token
value for it because there were no quotes around PRINT.  Easily fixed:

printst { boolean colon = false; }
	: #("PRINT" expr ( COLONPRINT {colon = true;})?)
	...


Let's try again:


 ( PROGRAM SHELL ( LOOP ( PRINT "X" : ) ( INPUT COMMAND ) ( PRINT ( :
"Command is " COMMAND ) ) ( IF ( EQ COMMAND "Q" ) THEN EXIT ) ( PRINT JUNK )
REPEAT ) ( PRINT JUNK2 ) END ) null


 > programst(PROGRAM)
  > statement(LOOP)
   > loopst(LOOP)
    > statement(PRINT)
     > printst(PRINT)
      > expr("X")
       > logicexpr("X")
        > equalityexpr("X")
         > atom("X")
         < atom(:)
        < equalityexpr(:)
       < logicexpr(:)
      < expr(:)
     < printst(INPUT)
    < statement(INPUT)
    > statement(INPUT)
     > inputst(INPUT)
     < inputst(PRINT)
    < statement(PRINT)
    > statement(PRINT)
     > printst(PRINT)
      > expr(:)
       > catexpr(:)
        > atom("Command is ")
        < atom(COMMAND)
        > atom(COMMAND)
        < atom(null)
       < catexpr(null)
      < expr(null)
     < printst(IF)
    < statement(IF)
    > statement(IF)
     > ifst(IF)
      > logicexpr(PRINT)
       > equalityexpr(PRINT)
<AST>: unexpected AST node: PRINT
...

(Same message as before, but totally different meaning)
Ok, this is weird because you can see clearly above the tree fragment is 
( IF ( EQ COMMAND "Q" ) THEN EXIT ) ( PRINT JUNK )
But the tree parser is skipping to the PRINT, not going to the child (EQ
COMMAND "Q").  Inspecting the code shows:

ifst : ("IF" logicexpr "THEN" statement) ;

Sure enough he forgot the # which says to look at children of "IF".  So it
skips to the sibling of "IF" which is "PRINT" (and which is not referenced
by equalityexpr)

Fixed:

ifst : #("IF" logicexpr "THEN" statement) ;

Summary:

Print your tree and use tracing.
Inspect your token types files for mismatched or duplicated tokens.
Remember to always quote your literals.
Remember your # in front of root nodes.

Monty

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list