[antlr-interest] NQOT: Grammar meta-programming

Sat Dec 8 18:54:08 PST 2007

On 12/8/07, Andy Tripp <antlr at jazillian.com> wrote:
>  It does seem like you could get 90% of the way in parsing languages like C,
> C++, Java, and C#
>  with some generic scheme. But in practice,  it turns out to  be  less than
> 10%. That's because
>  a *basic* parser, which covers most of the language, seems like it's 90% of
> the effort, but it's
>  really only 10%. So let's say your *basic* parser covers expressions, flow
> control, and function declarations.
>  Great! Now start on function pointers, templates/generics, operator
> overloading,
>  preprocessor directives, typedef, etc. What's the saying? "In software, the
> first 90% takes 90% of the time,
>  and the last 10% takes the OTHER 90% of the time" :)

Yep, that wouldn't surprise me. However, if the goal is to be able to
handle esoteric languages (particularly where the syntax could be
tweaked) most of these really horrible aspects probably won't be
present. Well, that's my understanding of esoteric languages - odd
syntax, odd semantics, but generally straightforward to parse.

>  Then try it on real-world programs, and you'll be amazed at
>  what strange syntax is actually considered valid (especially in C++). Your
> parser fails, you

C++ is not a language anyone in their right mind would want to try and
parse. Ever. I should have ruled it out.

>  And that's not even mentioning column-based formatting (FORTRAN, COBOL),
>  and constructs within the program that can change how the program should be
> parsed (COBOL).

Those would probably be outside the immedate grouping of languages to
be dealt with.

>  Beyond just successful parsing, there's the issue of creating a reasonable
> AST.
>  As a simple example, the return type of a C/C++ function is optional, but
> you might want
>  the AST to have a return type. So you want to stuff an "int" node in the
> AST during parsing.
>  Or you might want to building symbol tables during parsing.

That doesn't strike me as patricularly remarkable semantics.

>
>  So for any real-world app, you could copy-and-paste some common parts of a
> grammar
>  (such as grabbing the expression-part of a C grammar for use in your Java
> grammar), or
>  you could come up with some fancy generic mechanism. Either way, that work
> is going to
>  be a tiny amount of the total work (say 0.01%) anyway.

I think it'll be more than that...perhaps I'm really thinking in terms
of a rapid prototyping tool, rather than a genuine solution for
building industrial strength parsers.

>  One way to approach this is to compare the C and Java ANTLR grammars.
> You'll see that
>  these apparently similar languages don't have all that much in common in
> their grammars.

Interesting.
>  On the other hand...one approach I've thought about would be to use
> programming-by-example.
>  You feed your magic tool sets of examples: "Here is a program, and here is
> the AST that
>  it should produce". If you can make your set of examples exhaustive (i.e.
> cover all language
>  constructs), that seems like it might work.

Yeah. Or even, "Translate the following pseudo code into your
language." Then the metaparser will know the meaning of your code
without you having to explicitly write the AST.

Steve