[antlr-interest] Irregular AST construction

Benjamin S Wolf jokeserver at gmail.com
Wed Jul 4 18:11:58 PDT 2012

Hi Mike,

If I had to guess, I'd say Antlr is not actually looking at the end of
that rule when it generates the code for the subrule (so it uses
streams in case you have more of the same tokens or rules); you'll get
similar code if you leave out the ( COLLATE_SYM identifier )* but
leave the parens around the alternatives.

Personally, I've always had problems with rewrite rules appearing in
alternatives, particular when I try to reference labels that appear
later in the rule (in e.g. your example, you can't reference
"identifier" in a rewrite rule in the subrule above it). My strategy
such cases is to use semantic predicates instead of embedding rewrite
rules in subrules:

                | PLUS_SYM p1=primary
                | MINUS_SYM p2=primary
        (options {greedy = true;}: COLLATE_SYM identifier)*
        -> {$function_call}? ^(FUNCTION_CALL function_call)
        -> literal? field_name? ... PLUS_SYM? $p1? MINUS_SYM? $p2? ...
interval_expression? ( COLLATE_SYM identifier )*

Your case is a little longer than mine have been, so that kind of
rewrite rule may be a little heavyweight. You could also try moving
the long list of alternatives into its own rule -- if you don't have
parentheses around the whole thing it'll generate the right code (and
if you only want a list of tokens instead of a tree it'll work just
fine with the COLLATE subrule (though the COLLATE subtree will also be
returned in the function_call case, which it's not doing in the case
you gave)).

On Wed, Jul 4, 2012 at 12:05 AM, Mike Lischke <mike at lischke-online.de> wrote:
> Hi,
> this might be related to the C target only, but I'm not sure.
> Given this rule:
> primary:
>         (
>                 literal
>                 | field_name
>                 | function_call// -> ^(FUNCTION_CALL function_call)
>                 | PARAM_MARKER
>                 | variable
>                 | PLUS_SYM primary
>                 | MINUS_SYM primary
>                 | BITWISE_NOT primary
>                 | LOGICAL_NOT primary
>                 | BINARY_SYM primary
>                 | ROW_SYM expression_list
>                 | EXISTS_SYM subquery
>                 | match_expression
>                 | case_expression
>                 | interval_expression
>         )
>         (options {greedy = true;}: COLLATE_SYM identifier)*
> ;
> I see a completely different tree construction in the generated parser depending on whether I enable the single (out-commented) rewrite rule or not. If I leave it out then everything is ok. With it, though, the generated code switches to using local streams for each alternative, but does not create the root_0 node (except for the function_call alternative). As a consequence the primary() function returns an empty tree for most of the alternatives.
> So far I had not the impression that I have to add a rewrite rule to every alternative if I have one that has a rule. Is this a bug in code generation or should I now think to add a rule to all alternatives in such scenarios?
> Mike
> --
> www.soft-gems.net
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address

More information about the antlr-interest mailing list