[antlr-interest] Error nodes created upon syntax error

Sun Jan 6 08:48:03 PST 2008

Hello,
(sorry for my bad english)

I can see that there is a problem with token deletion/insertion if you
are also building trees.
Take this rule, for exemple :

test	:	'var' ID ';'    -> ^('var' ID);

If the input if "var ;", the token insertion system detect that the
token "ID" is missing, then report the error, but continue parsing.

If you look closer to the generated code, you will see :

-----
ID2=(Token)input.LT(1); // save ID2
match(input,ID,FOLLOW_ID_in_test26);
stream_ID.add(ID2); // ID2 have a bad reference
----

ID2 contains a reference to the token ';' and not to the token ID. The
"match" procedure doesn't thow any exception because of the "token
insertion" system.

So the resulting tree will be in reality ^( 'var' ';') ....and it is
totally incorrect, am I right ?

If I want to use the "token deletion/insertion symbol" with tree
building, can I modify the "match" procedure in order to modify, for
instance, the content of "ID2" ( without altering the reference) ?

I imagined a workaround. ( LA(i) is the token at the index
current_pos+i in the stream ).

if there is a token insertion, do this in the "match" procedure :

1. add the "special" imaginary token (matching the missing token) in
the stream at the postion LA(2) (position is wrong now). The stream
must allow token insertion.
2. Swap (contents and not references) LA(1) and LA(2). (you have to
correct index informations)
3. ID2 still has a reference to LA(1), but the content of the token is
now "special imaginary ID token".

if there is a token deletion, do this in the "match" procedure :

1. Save LA(1) content to a temporary variable : temp_var
2. Copy the content of LA(2) in LA(1)
3. Copy the content of temp_var in LA(2)
4. Swap (references only) LA(1) AND LA(2)
5. ID2 has reference to **OLD** LA(1), but now it is LA(2).

In the commonTreeAdaptator.create procedure :

1. if the token is a "special" imaginary token : return an ERROR node
(like Terence proposal)
2. else : create a node like usual.

What do you think about this this (non-tested) workaround ?

The best solution, I think, is that "match" procedure returns the
reference of the real matched token.

On Dec 2, 2007 8:24 PM, Terence Parr <parrt at antlr.org> wrote:
> hi,
>
> Currently syntax errors cause invalid trees and possibly even runtime
> exceptions when building ASTs. What we really need I believe is to
> have rules that encounter syntax errors return an ERROR node of some
> sort that records where the error occurred and, with luck, the tokens
> consumed during recovery. I started an improvement request:
>
> http://www.antlr.org:8888/browse/ANTLR-193
>
> The basic idea is that ERROR nodes get used in place of ASTs that
> would normally be produced by rule indications.  For example, the
> following rule would return a valid AST except for the subtrees
> associated with rule refs in encountering syntax errors:
>
> forDecl : 'for' '(' decl ';' expr ';' expr ')' stat -> ... ;
>
> If there is an error inside decl, the tree would return
>
> ^('for' ERROR subtree-expr subtree-expr)
>
> This effectively means that I must turn off the single token
> insertion and deletion that occurs automatically within a single
> rule.  If a syntax error occurs, the immediately surrounding rule
> must terminate in return an error node.
>
> Does this make sense? I would like to stick this into 3.1 release.
>
> Ter
>