[antlr-interest] rewrite rules cont.

John B. Brodie jbb at acm.org
Wed Oct 27 13:26:02 PDT 2010


On Tue, 2010-10-26 at 20:07 -0700, Trevor John Thompson wrote:
> Greetings.
> I continue to wrestle with rewrite rules for AST construction. I am trying to treat semicolon and newline as equivalent separators, and gather a sequence of expressions as children of a single AST node. The grammar looks like
> =======
> grammar Test;
> options {output=AST;}
> prog:	expr EOF!;
> expr:	(term->term) (((NL|SC) term)+ -> ^(NL $expr term+))?;
> term:	ID
> 	|	->ID	// empty treated as no-name ID
> 	;
> fragment
> SP	:	' '|'\t';
> SC	:	';';
> ID	:	SP*
> 		('a'..'z'|'A'..'Z'|'_')
> 		('0'..'9'|'a'..'z'|'A'..'Z'|'_')*
> 	;
> NL	:	('\r'|'\n')+;
> =======
> The problem is that if the sequence does *not* include newline, then i get RewriteEmptyStreamException on the NL in the rewrite rule; i.e. "a;\n" works, but "a;" does not.
> 
> What particularly baffles me is that if i build the node with any token other than NL or SC (e.g. SP), then the rule *always* works.
> 
> Could someone please explain what is going on?

ANTLR will create a root token when that token does not appear on the
left hand side of the rewrite operator (the ->). this is known as an
`imaginary token`. imaginary tokens do not appear in the input token
stream.

But any token that appears on both sides of the -> must be present in
the input token stream as you have encountered.

So you want to create a NL token as the root, even tho it does not
appear in the input token stream - but might. therefore:

expr:   term (((x=NL|x=SC) term)+ -> ^(NL[$x] term+))?;

the [...] stuff on the right hand side of the rewrite tells ANTLR to
always construct a new imaginary token that is derived from a real
token. the stuff inside the [] tells ANTLR how to initialize the
imaginary token. so in the above case "a;" will end up with a tree whose
root is actually a NL token.type but with a token.text of ";" and
position information of the SC.

SP as root node worked because it did not appear on the left hand side
of the rewrite so ANTLR just knew you wanted to construct an imaginary
token (but with no text or position information initialized).

you really want to use the [] form of token construction so that the
position information will get set so that later error messages will be
(hopefully) more meaningful.

overriding the text of the NL to be ";" is, to me, rather unexpected. so
i would suggest 

expr : term ((x=NL|x=SC) term)+ -> ^(EXPR_LIST[$x,"EXPR_LIST"] term+) ;

where EXPR_LIST is an imaginary token type that you have specified in
the tokens{} section of your grammar.

hope this helps...
   -jbb





More information about the antlr-interest mailing list