[antlr-interest] Error nodes created upon syntax error

Fri Jan 11 11:18:42 PST 2008

Hi Alessandro. thanks for the suggestion.  Yes, I've been thinking  
about this problem and it is even more general.  What do you do about  
actions that must execute after recovery even though they refer to a  
token that does not exist?!

The unfortunate truth comes down to the following: single token  
insertion and deletion recovery with in an alternative is really sexy  
for journal papers, but I believe I've convinced myself that they are  
not practical.  Well, at least in the presence of actions.

The  Simple solution is to turn this off, relying on a normal "exit  
rule upon syntax error" mechanism but leave the insertion and deletion  
mechanism as an option by overriding methods.

Ter
On Jan 6, 2008, at 8:48 AM, Alessandro wrote:

> Hello,
> (sorry for my bad english)
>
> I can see that there is a problem with token deletion/insertion if you
> are also building trees.
> Take this rule, for exemple :
>
> test	:	'var' ID ';'    -> ^('var' ID);
>
> If the input if "var ;", the token insertion system detect that the
> token "ID" is missing, then report the error, but continue parsing.
>
> If you look closer to the generated code, you will see :
>
> -----
> ID2=(Token)input.LT(1); // save ID2
> match(input,ID,FOLLOW_ID_in_test26);
> stream_ID.add(ID2); // ID2 have a bad reference
> ----
>
> ID2 contains a reference to the token ';' and not to the token ID. The
> "match" procedure doesn't thow any exception because of the "token
> insertion" system.
>
> So the resulting tree will be in reality ^( 'var' ';') ....and it is
> totally incorrect, am I right ?
>
> If I want to use the "token deletion/insertion symbol" with tree
> building, can I modify the "match" procedure in order to modify, for
> instance, the content of "ID2" ( without altering the reference) ?
>
> I imagined a workaround. ( LA(i) is the token at the index
> current_pos+i in the stream ).
>
> if there is a token insertion, do this in the "match" procedure :
>
> 1. add the "special" imaginary token (matching the missing token) in
> the stream at the postion LA(2) (position is wrong now). The stream
> must allow token insertion.
> 2. Swap (contents and not references) LA(1) and LA(2). (you have to
> correct index informations)
> 3. ID2 still has a reference to LA(1), but the content of the token is
> now "special imaginary ID token".
>
>
> if there is a token deletion, do this in the "match" procedure :
>
> 1. Save LA(1) content to a temporary variable : temp_var
> 2. Copy the content of LA(2) in LA(1)
> 3. Copy the content of temp_var in LA(2)
> 4. Swap (references only) LA(1) AND LA(2)
> 5. ID2 has reference to **OLD** LA(1), but now it is LA(2).
>
>
> In the commonTreeAdaptator.create procedure :
>
> 1. if the token is a "special" imaginary token : return an ERROR node
> (like Terence proposal)
> 2. else : create a node like usual.
>
> What do you think about this this (non-tested) workaround ?
>
> The best solution, I think, is that "match" procedure returns the
> reference of the real matched token.
>
> On Dec 2, 2007 8:24 PM, Terence Parr <parrt at antlr.org> wrote:
>> hi,
>>
>> Currently syntax errors cause invalid trees and possibly even runtime
>> exceptions when building ASTs. What we really need I believe is to
>> have rules that encounter syntax errors return an ERROR node of some
>> sort that records where the error occurred and, with luck, the tokens
>> consumed during recovery. I started an improvement request:
>>
>> http://www.antlr.org:8888/browse/ANTLR-193
>>
>> The basic idea is that ERROR nodes get used in place of ASTs that
>> would normally be produced by rule indications.  For example, the
>> following rule would return a valid AST except for the subtrees
>> associated with rule refs in encountering syntax errors:
>>
>> forDecl : 'for' '(' decl ';' expr ';' expr ')' stat -> ... ;
>>
>> If there is an error inside decl, the tree would return
>>
>> ^('for' ERROR subtree-expr subtree-expr)
>>
>> This effectively means that I must turn off the single token
>> insertion and deletion that occurs automatically within a single
>> rule.  If a syntax error occurs, the immediately surrounding rule
>> must terminate in return an error node.
>>
>> Does this make sense? I would like to stick this into 3.1 release.
>>
>> Ter
>>