[antlr-interest] combine tokens in rewrite rule

Fri Nov 9 19:33:53 PST 2007

"When addressing a student, learn, and do not assume, their needs."
  - One of "Rules To Teach By" ;)

Ok, I get it now. Each of these techniques produces the same result: one 
node, one token, the fully-qualified identifier text (if desired). 
However, they differ in what it takes to match the identifier.

If the fully-qualified identifier can be matched with the lexer than the 
lexical solution is fine-'n-dandy.

If you need the parser to match it, then you can use the tree 
rule-rewrite grammar to define the node as a previously named token with 
the matched text in whatever coded manner you like (which is what you 
meant by "rewrite the rule as the imaginary token"). That would make 
this solution, uhm, Jim-dandy. <chuckle>

Thx for the clarification.
-- Curtis

Jim Idle wrote:
> However, what a person asks for, and what they need are not necessarily 
> the same thing. ;-) 
> 
> It doesnt have to be an imaginary token, but it usually is because 
> there won't be a lexer defined token to use with the rewrite, given that 
> you are parsing the construct rather than lexing it. 
> 
> So, you are parsing the elements of some complicated reference or 
> variable or class etc, and you need '.' in other places in your parser, 
> and you also need to look at the individual pieces of the id. When you 
> send the reference to the tree parser, you want to tag it with something 
> to introduce it as a reference. Hence you would usually use an imaginary 
> token as the place holder for the reference and have it introduce the 
> individual pieces of the reference, which can then be looked up to find 
> out if they are enumerations, objects and so on, such that the tree 
> parser deals with them accordingly. 
> 
> If you pass the whole thing in as one token from the lexer, then you 
> will probably end up splitting the token text anyway, so you can look up 
> the context. However, if you never need to do this, then a lexical 
> solution probably does work for you. Trying to apply context within the 
> lexer rules though is definitely not something you should be doing by 
> choice.
> 
> Now keep in mind that there are always 18 ways to skin a cat, and that's 
> just the way I do such things, it's whatever floats your boat in the end 
> :-)
> 
> Check the wiki or book for the rewrite syntax, but you can set the text 
> of a token when you rewrite it.
> 
> So, your options are:
> 
> 1) Lexical if there is no need to do anything with the different 
> components (maybe you are formatting and dont need to know what it is 
> for instance).
> 2) Declare a local String variable and as you get each ID text, append 
> it, then rewrite with that as the token text (here this would be so that 
> you have simpler lexer rules or are avoiding some lexing ambiguity say, 
> because putting the text back together is kind of redundant (I seem to 
> think that this was why the first question received the answer it did);
> 3) Rewrite the place holding imaginary token and each of the name 
> components. If you can work out the type or context at this stage in the 
> parse, then you might write one of a number of imaginaries, but If you 
> have to parser the whole thing before you can work out types, then you 
> would use one token and resolve the types in the next phase.
> 
> So:
> tokens
> {
> 	REFERENCE;
> }
> 
> id:
>    i=ID (DOT i+=ID) -> ^(REFERENCE $i+)
> ;
> 
> Or perhaps, if you have context, something like
> 
> id:
>    v=ID (DOT r+=ID)
> 
> 	-> {lookup($v) == OBJECT}? ^(OBJECT $v $r*)
> 	-> and so on
> 
> Jim