[antlr-interest] Re: C++ grammar

Terence Parr parrt at jguru.com
Thu Jun 13 11:36:10 PDT 2002


On Thursday, June 13, 2002, at 11:31  AM, cppljevans wrote:

> --- In antlr-interest at y..., Terence Parr <parrt at j...> wrote:
>> Folks,
>>
>> A number of people are playing with a C++ front end for ANTLR
> (either
>> from scratch or by converting old PCCTS grammar forward to ANTLR).
> I
>> might be putting some effort behind making a standard C++ parser for
>> ANTLR and could use any head start people have.  So, who's been
> doing
>> what? :)
>>
> I'm trying to convert Lilley's parser to a pretty printer for c++.
> I'm planning on using c++, and might current focus is getting
> the lexer to work.  The main problem is passing the "expanded"
> tokens to the parser; yet, just printing the "unexpanded" tokens.
> By "expanded" token, I mean the tokens that are the result of
> either #include <file> or processing a preprocessor macro.

Preprocessor stuff is typically done as a char stream filter so the C++ 
lexer is not complicated by the preprocessor.  Helps to separate these 
tasks.  It can be done, of course.  You might also just say "I'll use 
/lib/cpp" ;)  Naturally this makes pretty printing harder as you don't 
always know what was the original source ;)

> I haven't coded anthing yet (except converting some of Lilley's
> data structures to stl), but I'm thinking of merging some of
> the ideas in http://www.antlr.org/doc/streams.html with
> Lilley's macro expansion methods ( see void
> CPreParserImp::ExpandTokenList in cpre_expand.cpp).
>
> To be more specific, I'm thinking of the lexer as a stack of
> iterators, where each iterator corresponds either to a file or
> a macro invocation.  The output tokens would only come from the
> bottom of the stack, whereas the parser would always read from
> the top.  Since the bottom corresponds to the original source file,
> only tokens from the original source would be output.

Yeah, a more general queue for TokenStream would be useful that let the 
lexer push more than one token on the stream at once would be groovy.

>
> For example, given the following code in test.cpp:
>
> #define DECLB  int b
> int a;
> DECLB ;
> int c;
>
> Then the lexer stack, just before the read of b, would contain:
>
>     int b
>         ^
>     int a ; DECLB ; int c ;
>                   ^

Oh, well you can just push lexer input states for this.  There is a 
stack mechanism already for nested lexing and parsing.

> I'd appreciate any feedback on this design.

Cool.  Let us know how it goes.

Ter
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list