[antlr-interest] Enhancement request for identifying imaginary tokens

Johannes Luber jaluber at gmx.de
Thu Dec 20 07:20:44 PST 2007


Steve Bennett schrieb:
> On 12/15/07, Terence Parr <parrt at cs.usfca.edu> wrote:
>> Hi. Why not do what I do:
>>
>> s : 'if' e 'then' s -> ^('if' e s)
>>    | e ';' -> e
>>    ;
>>
>> etc...  no need for imaginary tokens.  Remember imag is for nodes that
>> have no corresponding input token.
> 
> I'm a bit unclear on this. Not having yet gotten to the tree-walking
> or output generation phases, I don't know the ramifications of the
> choices, but it seems somehow a bit untidy having an AST composed of
> both imaginary and real tokens. I would be inclined towards using IT's
> everywhere, in order to have consistent naming etc, but this isn't a
> good idea?

I found that there are three places for imaginary tokens:

1. If you want to rename a token to highlight the context it is in. For
example, '<', known as LT, can mean really the lower-than-operator or
the opening symbol for generics. I disambiguate with OP_LT[LT] or
OPEN_GENERICS[LT].

2. Use as root where no other normal token is available. I'm ambivalent
on this, as I'm not sure if you have to get to a situation like ^(ROOT
$rule otherToken) or if ^(otherToken $rule) is also sufficient. But
there is the case:

explicit_anonymous_function_parameter
    :   anonymous_function_parameter_modifier? type IDENTIFIER
        -> ^(EXPLICIT_ANONYMOUS_FUNCTION_PARAMETER ^(OPTIONAL
anonymous_function_parameter_modifier?) type IDENTIFIER)
    ;

Without an imaginary token ANTLR would see "^(^(" - a tree as root of
another tree, which is illegal. The made up root prevents this.

3. Disambiguation for tree grammars. I prefer to remove backtracking and
predicates from tree grammars as much as possible, as those are usually
repetition of work of former used parsers. When the grammar analysis
complains about an ambiguity, which can't be solved by left-factoring
easily, I insert a special unique token at the beginning of the
ambiguous token stream.

Hope that helps,
Johannes


More information about the antlr-interest mailing list