[antlr-interest] NQOT: Grammar meta-programming

Fri Dec 7 14:02:04 PST 2007

Gavin Lambert wrote:
> At 05:02 8/12/2007, Andy Tripp wrote:
>> On the other hand...one approach I've thought about would be to use 
>> programming-by-example.
>> You feed your magic tool sets of examples: "Here is a program, and 
>> here is the AST that
>> it should produce". If you can make your set of examples exhaustive 
>> (i.e. cover all language constructs), that seems like it might work.
>>
>> So the tool could store its grammar in whatever format it wants 
>> (ANTLR or something completely different), but you'd essentially 
>> define your parser not as a traditional BNF-style grammar, but rather 
>> as a set of example (input, AST) pairs.
>
> That'd be pretty cool.  Although I suspect in practice you'd probably 
> need to have an (input, token stream, AST) triplet (or sets of 
> input=>tokens and sets of tokens=>ASTs).  Going straight from input to 
> AST is probably a bit too hard 

After thinking about it, I wonder if what I'm really looking for isn't 
just a wizard. Something that would step through the grammar 
specification process, whip up a lexer and a parser, and then stay out 
of the way. Most of the rest of it could be done by differencing with 
existing languages:

What is the purpose of this recognizer: [ compiling | interpreting | 
rewriting | validating ]

Use traditional 'C' identifier tokens? [y/n]
Allow these extra characters *inside* an identifier: [_____]
Allow these extra characters to *start* an identifier: [______]

What operator syntax do you want to use: [C | Fortran | SQL]

What keywords do you want to start with: [C | C++ | Java | Eiffel | SQL 
| Fortran ]

Edit the list of keywords here: [ (big text entry box) ]

This would obviously get more elaborate as more languages were brought 
into the fold, and it would also get more elaborate as more "purposes" 
were added -- possibly to the point of providing tree parsers 
automatically as well.

> Actually even the first half of that (input => token stream) would be 
> a big help in many cases.  Since ANTLR doesn't have much debugging 
> support for lexers, it's easy to accidentally break something in weird 
> ways (especially if you don't have unit tests).

I'm working on that right now. I've got a rewrite of gunit that does 
token stream checking -- that's the string expression thingy I was 
talking about. I've got to actually "go to work" next week, but I'll 
probably have something alpha-ready sometime during or shortly after the 
holidays.

=Austin