[antlr-interest] Two common tree parser errors

Anthony W Youngman Anthony.Youngman at ECA-International.com
Thu Apr 10 01:46:42 PDT 2003


Thanks Monty (thanks in public for a lot of private help).

And thanks for yet another tip! Okay, I should try to read the docu and
pick up on this but it's quite hard to find - next time I shall have to
try with -traceTreeParse, which I didn't even know existed!

Thanks a lot
Wol

-----Original Message-----
From: mzukowski at yci.com [mailto:mzukowski at yci.com] 
Sent: 09 April 2003 18:10
To: antlr-interest at yahoogroups.com
Subject: [antlr-interest] Two common tree parser errors


Anthony Youngman asked me to help debug his tree parser.

First I invoked antlr.Tool to instrument for tracing the tree parser:

java antlr.Tool -traceTreeParser BASIC.g

I run it on the sample input.  Anthony prints the AST before tree
parsing:

 ( PROGRAM SHELL ( LOOP ( PRINT "X" : ) ( INPUT COMMAND ) ( PRINT ( :
"Command is " COMMAND ) ) ( IF ( EQ COMMAND "Q" ) THEN EXIT ) ( PRINT
JUNK )
REPEAT ) ( PRINT JUNK2 ) END ) null

 > programst(PROGRAM)
  > statement(LOOP)
   > loopst(LOOP)
<AST>: unexpected AST node: PRINT

Hmm, lets look at the rules:
loopst 
	: #("LOOP" (statement)+ "REPEAT" ;

statement : ( inputst | printst | exitst | ifst | loopst | NULL ) ;

printst { boolean colon = false; }
	: #(PRINT expr ( COLONPRINT {colon = true;})?)
	...
Looks ok, but it's not matching properly.  Loop calls statement calls
printst, so what's the deal?  Could the token types be wrong between
parser
and tree parser?  Unlikely since they are in the same file, but let's
inspect the BASICTokenTypes.txt file anyways:

LITERAL_PRINT="PRINT"=37
...
PRINT=41

Aha.  Literal tokens get a prefix prepended to their name.  ANTLR
thought
printst was introducing a new token type named PRINT and created a new
token
value for it because there were no quotes around PRINT.  Easily fixed:

printst { boolean colon = false; }
	: #("PRINT" expr ( COLONPRINT {colon = true;})?)
	...


Let's try again:


 ( PROGRAM SHELL ( LOOP ( PRINT "X" : ) ( INPUT COMMAND ) ( PRINT ( :
"Command is " COMMAND ) ) ( IF ( EQ COMMAND "Q" ) THEN EXIT ) ( PRINT
JUNK )
REPEAT ) ( PRINT JUNK2 ) END ) null


 > programst(PROGRAM)
  > statement(LOOP)
   > loopst(LOOP)
    > statement(PRINT)
     > printst(PRINT)
      > expr("X")
       > logicexpr("X")
        > equalityexpr("X")
         > atom("X")
         < atom(:)
        < equalityexpr(:)
       < logicexpr(:)
      < expr(:)
     < printst(INPUT)
    < statement(INPUT)
    > statement(INPUT)
     > inputst(INPUT)
     < inputst(PRINT)
    < statement(PRINT)
    > statement(PRINT)
     > printst(PRINT)
      > expr(:)
       > catexpr(:)
        > atom("Command is ")
        < atom(COMMAND)
        > atom(COMMAND)
        < atom(null)
       < catexpr(null)
      < expr(null)
     < printst(IF)
    < statement(IF)
    > statement(IF)
     > ifst(IF)
      > logicexpr(PRINT)
       > equalityexpr(PRINT)
<AST>: unexpected AST node: PRINT
...

(Same message as before, but totally different meaning)
Ok, this is weird because you can see clearly above the tree fragment is

( IF ( EQ COMMAND "Q" ) THEN EXIT ) ( PRINT JUNK )
But the tree parser is skipping to the PRINT, not going to the child (EQ
COMMAND "Q").  Inspecting the code shows:

ifst : ("IF" logicexpr "THEN" statement) ;

Sure enough he forgot the # which says to look at children of "IF".  So
it
skips to the sibling of "IF" which is "PRINT" (and which is not
referenced
by equalityexpr)

Fixed:

ifst : #("IF" logicexpr "THEN" statement) ;

Summary:

Print your tree and use tracing.
Inspect your token types files for mismatched or duplicated tokens.
Remember to always quote your literals.
Remember your # in front of root nodes.

Monty

 

Your use of Yahoo! Groups is subject to
http://docs.yahoo.com/info/terms/ 



 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 

-------------- next part --------------
This transmission is intended for the named recipient only. It may contain private and confidential information. If this has come to you in error you must not act on anything disclosed in it, nor must you copy it, modify it, disseminate it in any way, or show it to anyone. Please e-mail the sender to inform us of the transmission error or telephone ECA International immediately and delete the e-mail from your information system.

Telephone numbers for ECA International offices are: Sydney +61 (0)2 9911 7799, Hong Kong + 852 2121 2388, London +44 (0)20 7351 5000 and New York +1 212 582 2333.


More information about the antlr-interest mailing list