[antlr-interest] rewrite rules cont.

Trevor John Thompson tijet at mac.com
Wed Oct 27 14:25:31 PDT 2010


On 2010 Oct 27, at 13:26, John B. Brodie wrote:

> On Tue, 2010-10-26 at 20:07 -0700, Trevor John Thompson wrote:
>> Greetings.
>> I continue to wrestle with rewrite rules for AST construction. I am trying to treat semicolon and newline as equivalent separators, and gather a sequence of expressions as children of a single AST node. The grammar looks like
>> =======
>> grammar Test;
>> options {output=AST;}
>> prog:	expr EOF!;
>> expr:	(term->term) (((NL|SC) term)+ -> ^(NL $expr term+))?;
>> term:	ID
>> 	|	->ID	// empty treated as no-name ID
>> 	;
>> fragment
>> SP	:	' '|'\t';
>> SC	:	';';
>> ID	:	SP*
>> 		('a'..'z'|'A'..'Z'|'_')
>> 		('0'..'9'|'a'..'z'|'A'..'Z'|'_')*
>> 	;
>> NL	:	('\r'|'\n')+;
>> =======
>> The problem is that if the sequence does *not* include newline, then i get RewriteEmptyStreamException on the NL in the rewrite rule; i.e. "a;\n" works, but "a;" does not.
>> 
>> What particularly baffles me is that if i build the node with any token other than NL or SC (e.g. SP), then the rule *always* works.
>> 
>> Could someone please explain what is going on?
> 
> ANTLR will create a root token when that token does not appear on the
> left hand side of the rewrite operator (the ->). this is known as an
> `imaginary token`. imaginary tokens do not appear in the input token
> stream.
> 
> But any token that appears on both sides of the -> must be present in
> the input token stream as you have encountered.
> 
> So you want to create a NL token as the root, even tho it does not
> appear in the input token stream - but might. therefore:
> 
> expr:   term (((x=NL|x=SC) term)+ -> ^(NL[$x] term+))?;
> 
> the [...] stuff on the right hand side of the rewrite tells ANTLR to
> always construct a new imaginary token that is derived from a real
> token. the stuff inside the [] tells ANTLR how to initialize the
> imaginary token. so in the above case "a;" will end up with a tree whose
> root is actually a NL token.type but with a token.text of ";" and
> position information of the SC.
> 
> SP as root node worked because it did not appear on the left hand side
> of the rewrite so ANTLR just knew you wanted to construct an imaginary
> token (but with no text or position information initialized).
> 
> you really want to use the [] form of token construction so that the
> position information will get set so that later error messages will be
> (hopefully) more meaningful.
> 
> overriding the text of the NL to be ";" is, to me, rather unexpected. so
> i would suggest 
> 
> expr : term ((x=NL|x=SC) term)+ -> ^(EXPR_LIST[$x,"EXPR_LIST"] term+) ;
> 
> where EXPR_LIST is an imaginary token type that you have specified in
> the tokens{} section of your grammar.
> 
> hope this helps...
>   -jbb


Thank you very much for your clear, detailed, and thoughtful explanation.
I appreciate now that tree rewriting is not a matter of arbitrarily tossing around nodes; ANTLR carefully maintains internal information to assist the parser in generating good diagnostics. I may be catching on. . .

TJ
--
Trevor John Thompson    net: tijet at mac.com

Quidquid Latine dictum sit, altum videtur.



More information about the antlr-interest mailing list