[antlr-interest] Macro definitions: most elegant solution?

Mon May 26 21:43:40 PDT 2003

One thing about my using antlr is that it's so cool and it eases 
things so much, it makes me feel dumb whenever I am doing something 
tedious: there must be an easier way if I knew the tool better.

I solved my previous problem (separating declarators from initializers 
in a C++-like grammar) by nicely matching the same AST fragment 
against multiple tree parsing rules: one filters out initializers, one 
transforms initializers into expressions, and so on. Sweet!

On to my current problem. I am trying to implement a macro definition 
facility with antlr. The macros are similar to C #defines, except that 
they don't suck :o). They're scoped and obey the language grammar much 
more.

So the syntax for the simplest case would be:

$define name { body }

Later, when seeing "name", the compiler will replace it with "body", 
also stripping the "{" and "}". If you want to define a macro that 
actually contains a "}", you can do that by saying:

$define name ( body )

If you want a macro that contains a "}" and a ")", you can say:

$define name [ body ]

Finally, if you want a macro that contains a "}", a ")", and a "]", 
you can say:

$define name sep body sep

where sep is any user-defined identifier. In all cases, the macro 
processor will eliminate the separators when expanding. I guess this 
looks a little baroque, but I believe it is coherent and useful.

So, the question is now - where to implement this facility? I tried to 
implement it in the lexer, but I hit this problem: upon reading one 
token, the lexer must return /multiple/ tokens to the caller, and it 
looks like the lexer wasn't designed to facilitate that. (Let me add 
in passing that it would be a great feature. Note that the lexer 
design allows to return /less/ tokens than read, through the SKIP 
mechanism.)

Then I said I'd implement the feature straight in the parser, but I 
couldn't make that work because I couldn't figure out how to tell the 
parser "stitch in these new tokens and reparse them". 

So now I am making good progress with using the TokenStream concept. I 
have a MacroProcessor class that sits in between the lexer and the 
parser and reacts upon seeing the '$' token. It then does some 
rudimentary matching algorithm and maintains the macro definition 
tables.

This works ok; however, what I felt the need for was a kind of a 
parser functionality. I mean, I want to define a little grammar such 
as:

macroDefinition
    : DOLLAR "define" name body
    ;

that has tokens as input alphabet, and ouputs tokens as well (instead 
of ASTs). I saw there was a hint at that in the end of http://www.
antlr.org/doc/streams.html. 

So if you have any opinions on how I can improve this design, please 
let me know. Thanks!

Andrei

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/