[antlr-interest] combine tokens in rewrite rule

Fri Nov 9 15:31:32 PST 2007

I understand and agree with AST sub-trees for fully qualified 
identifiers in complex languages. However, that is not what either the 
referenced thread poster nor this thread's original poster asked.
They asked for one fully qualified name as one token in one node, which 
would be a lexical solution. I did not want to try to redefine their 
language.

The solution you gave in the referenced thread specified
  1 Declare an "imaginary" token declared in the tokens {} section.
  2 Accumulate the text of the individual IDs and dots ('.') in some 
unspecified manner.
  3 Rewrite the rule as the imaginary token set to the concatenated text 
in some unspecified manner.

What confuses me is why use an "imaginary" token, precisely how steps 2 
and 3 are performed, and how such a solution would differ from using 
lexical fragments as I demonstrated.

Could you provide a concrete example grammar? You got me all curious now. :)

Thanks
-- Curtis

Jim Idle wrote:
> But that is a lexical solution. When '.' is used in many places it is 
> quite often a better bet to have the parser determine the pieces of a 
> valid reference and in many cases you need the individual components 
> because the meanings change according to context.
> 
> For instance x.y could be an enumeration, or a property reference or 
> something else.
> 
> All that needs to be done is to take the .text of each element of the ID 
> and concatenate them. To be honest, I would probably not even do that in 
> the parser, but in the tree parser, where you probably have the 
> contextual information available (and may well not have in the parser). 
> Then the write would be something like ->^(IDEXPR $ids+ ) or some such.
> 
> Jim
> 
> -----Original Message-----
> From: Curtis Clauson [mailto:NOSPAM at TheSnakePitDev.com] 
> Sent: Friday, November 09, 2007 1:56 PM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] combine tokens in rewrite rule
> 
> I must admit, I am somewhat confused by the answer given in the 
> referenced thread. Doesn't the use of fragment lexer rules solve this?
> 
> For example, the grammar below will parse this input
> <<
> name
> name.subName1
> name.subName1.subName2.subName3
>  >>
> into a tree that has three ID nodes under the root nil node, each with 
> the complete qualified ID text as a single token. Is this what you mean, 
> 
> or have I missed something?
> 
> (tested with AntLR v3.0.1 and ANTLRWorks v1.1.4)
> ----------
> grammar ABer1;
> 
> options {
>      output = AST;
> }
> 
> 
> unit: ID+;
> 
> 
> ID: UnqualifiedID ('.' UnqualifiedID)*;
> WS: (' ' | '\t' | '\r' | '\n' | '\f')+ {$channel = HIDDEN;};
> 
> 
> fragment UnqualifiedID     : UnqualifiedIDFirst (UnqualifiedIDRest)*;
> fragment UnqualifiedIDFirst: 'a'..'z' | 'A'..'Z' | '_';
> fragment UnqualifiedIDRest : 'a'..'z' | 'A'..'Z' | '_' | '0'..'9';
> ----------
> 
> I hope that helps.
> -- Curtis
> 
> 
> Adrian Ber wrote:
>> Hi all,
>>
>> I want to find a way to combine multiple tokens in a single one.
>> I've searched the archive and found this thread: 
> http://www.antlr.org/pipermail/antlr-interest/2007-January/019161.html.
>> Does any of you have a short sample code on how to do it?