[antlr-interest] NQOT: Grammar meta-programming
Austin Hastings
Austin_Hastings at Yahoo.com
Fri Dec 7 14:02:04 PST 2007
Gavin Lambert wrote:
> At 05:02 8/12/2007, Andy Tripp wrote:
>> On the other hand...one approach I've thought about would be to use
>> programming-by-example.
>> You feed your magic tool sets of examples: "Here is a program, and
>> here is the AST that
>> it should produce". If you can make your set of examples exhaustive
>> (i.e. cover all language constructs), that seems like it might work.
>>
>> So the tool could store its grammar in whatever format it wants
>> (ANTLR or something completely different), but you'd essentially
>> define your parser not as a traditional BNF-style grammar, but rather
>> as a set of example (input, AST) pairs.
>
> That'd be pretty cool. Although I suspect in practice you'd probably
> need to have an (input, token stream, AST) triplet (or sets of
> input=>tokens and sets of tokens=>ASTs). Going straight from input to
> AST is probably a bit too hard
After thinking about it, I wonder if what I'm really looking for isn't
just a wizard. Something that would step through the grammar
specification process, whip up a lexer and a parser, and then stay out
of the way. Most of the rest of it could be done by differencing with
existing languages:
What is the purpose of this recognizer: [ compiling | interpreting |
rewriting | validating ]
Use traditional 'C' identifier tokens? [y/n]
Allow these extra characters *inside* an identifier: [_____]
Allow these extra characters to *start* an identifier: [______]
What operator syntax do you want to use: [C | Fortran | SQL]
What keywords do you want to start with: [C | C++ | Java | Eiffel | SQL
| Fortran ]
Edit the list of keywords here: [ (big text entry box) ]
This would obviously get more elaborate as more languages were brought
into the fold, and it would also get more elaborate as more "purposes"
were added -- possibly to the point of providing tree parsers
automatically as well.
> Actually even the first half of that (input => token stream) would be
> a big help in many cases. Since ANTLR doesn't have much debugging
> support for lexers, it's easy to accidentally break something in weird
> ways (especially if you don't have unit tests).
I'm working on that right now. I've got a rewrite of gunit that does
token stream checking -- that's the string expression thingy I was
talking about. I've got to actually "go to work" next week, but I'll
probably have something alpha-ready sometime during or shortly after the
holidays.
=Austin
More information about the antlr-interest
mailing list