[antlr-interest] Macro definitions: most elegant solution?
mzukowski at yci.com
mzukowski at yci.com
Tue May 27 10:27:09 PDT 2003
Take a look at http://www.codetransform.com/filterexample.html. Using a
parser as a token filter currently has some tricky issues, hopefully
explained well in this article.
Monty
-----Original Message-----
From: uprightness_of_character [mailto:andrei at metalanguage.com]
Sent: Monday, May 26, 2003 9:44 PM
To: antlr-interest at yahoogroups.com
Subject: [antlr-interest] Macro definitions: most elegant solution?
One thing about my using antlr is that it's so cool and it eases
things so much, it makes me feel dumb whenever I am doing something
tedious: there must be an easier way if I knew the tool better.
I solved my previous problem (separating declarators from initializers
in a C++-like grammar) by nicely matching the same AST fragment
against multiple tree parsing rules: one filters out initializers, one
transforms initializers into expressions, and so on. Sweet!
On to my current problem. I am trying to implement a macro definition
facility with antlr. The macros are similar to C #defines, except that
they don't suck :o). They're scoped and obey the language grammar much
more.
So the syntax for the simplest case would be:
$define name { body }
Later, when seeing "name", the compiler will replace it with "body",
also stripping the "{" and "}". If you want to define a macro that
actually contains a "}", you can do that by saying:
$define name ( body )
If you want a macro that contains a "}" and a ")", you can say:
$define name [ body ]
Finally, if you want a macro that contains a "}", a ")", and a "]",
you can say:
$define name sep body sep
where sep is any user-defined identifier. In all cases, the macro
processor will eliminate the separators when expanding. I guess this
looks a little baroque, but I believe it is coherent and useful.
So, the question is now - where to implement this facility? I tried to
implement it in the lexer, but I hit this problem: upon reading one
token, the lexer must return /multiple/ tokens to the caller, and it
looks like the lexer wasn't designed to facilitate that. (Let me add
in passing that it would be a great feature. Note that the lexer
design allows to return /less/ tokens than read, through the SKIP
mechanism.)
Then I said I'd implement the feature straight in the parser, but I
couldn't make that work because I couldn't figure out how to tell the
parser "stitch in these new tokens and reparse them".
So now I am making good progress with using the TokenStream concept. I
have a MacroProcessor class that sits in between the lexer and the
parser and reacts upon seeing the '$' token. It then does some
rudimentary matching algorithm and maintains the macro definition
tables.
This works ok; however, what I felt the need for was a kind of a
parser functionality. I mean, I want to define a little grammar such
as:
macroDefinition
: DOLLAR "define" name body
;
that has tokens as input alphabet, and ouputs tokens as well (instead
of ASTs). I saw there was a hint at that in the end of http://www.
antlr.org/doc/streams.html.
So if you have any opinions on how I can improve this design, please
let me know. Thanks!
Andrei
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list