[antlr-interest] match parser rule inside every rule (compile time reflections)

Mon Jan 10 16:13:21 PST 2011

It does seem a shame that you can't actually implement
languages like C/C++ very directly with any parser generator
I know of. Your example is hard for much the same reason
integrating the C preprocessor and C++ class compiles are hard.
I wonder if mechanisms like reflections might not become
more common if parser generator tools were better equipped
to make them less painful.

I've been tinkering with an idea I call "micropasses" for this
reason. In your example:

> class Test
> {
>     int i,j;
>
>     Test()
>     {
>         //iterates over all members of Test
>         #for_all($m,Test at members)
>         {
>             $m=0;
>         }
>         //will be evaluated to:
>         i=0;
>         j=0;
>     }
> }

So, one micropass from 'class' to final '}' to
expand class-level reflections (none in this
example) and gather up member names/types
(whether it's safe to combine those two
activities in one micropass depends on the precise
abilities of your reflections). That produces an
output token stream that can be processed again
by the parser.

Micropass #2 takes the output stream from
micropass #1 (class declarations now all known)
and for each member that has a function definition:
    Perform a micropass to expand reflections, and
    generate member function body code (again,
    might have to split into two passes depending
    on how wild the powers of reflection are).

In the previous description, "combine" really
means that when you hit a reflection "#" token,
you pause and perform a micropass whose output
effectively replaces the tokens that were inside
that # directive.

Whipping token streams around and reusing them
is not devastatingly hard, though arriving at a syntax
that integrates with normal grammar syntax and can
still be deemed a readable representation of what
grammar gets applied to which token stream when
is harder.

This mechanism could not be deemed a success if it
didn't handle normal C preprocessing and C++ class
multi-passing with reasonable aplomb. It's really just
a recognition that a lot of multipass work actually only
ends up touching a very small percentage of the entire
token stream in practice, so why not have a mechanism
for interweaving the multiple passes, invoking them
only exactly as needed?

> So I would need to put the
> rule for #for_all into all the rules inside my grammar which seems ugly
> and cumbersome.

Your adjectives seem apt.

> Any ideas how to solve this?

Just about any compilation problem can be solved by
adding on more passes :-). Not fun, but always possible.
Gets rid of some of the ugly, but not necessarily the
cumbersome.