[antlr-interest] How to set imaginary token text?

Mon Jul 16 12:20:56 PDT 2007

On Monday 16 July 2007 12:04, Vaclav Barta wrote:
> On Sunday 15 July 2007 20:00, Vaclav Barta wrote:
> > On Sunday 15 July 2007 19:07, Randall R Schulz wrote:
> > > ...
> > > You might want to consider consolidating these characters, if
> > > that would work for your purposes:
>
> Experimenting some more, maybe I'd like to parse (some of) these
> characters individually but consolidate them into one AST node -
> something like

Let me clarify that it is at the lexical level that a 
token-per-character approach incurs potentially excessive overhead. For 
example, a whitespace rule that matched single white-space characters 
vs. one that collected them together could make a large difference in 
the number of Tokens constructed for a given input text.

> quotedString returns [ String value ]
> @init { StringBuffer sb = new StringBuffer(); }
>
> 	: DQUOTE (
> 		EscapeSequence { sb.append($EscapeSequence.getText()); }
> 		| BareString { sb.append($BareString.getText()); }
> 	)* DQUOTE { $value = sb.toString(); }
> 	;
>
> string
> 	: s = quotedString -> LITERAL
> 	| BareString -> LITERAL
>
> 	;
>
> where LITERAL is an imaginary token - but as written, it obviously
> loses the string value. How can I set LITERAL token text to the value
> returned from quotedString, or $BareString.getText() ?

Do you have TDAR (The Definitive ANTLR Refernce)? If so, on page 176 
(paper) or page 188 (PDF), the notation for incorporating token 
references and / or token text into imaginary nodes is specified.

I have not used this mechanism, so I'm reluctant to try to either 
paraphrase or rewrite your grammar using these notations. Perhaps 
someone who knows better will supply the appropriate rules.

>  	Bye
>  		Vasek

Randall Schulz