[antlr-interest] Pre-processor advice [C target]

Jim Idle jimi at temporal-wave.com
Thu Aug 30 13:44:29 PDT 2012


I would just create a couple of functions that worked off an output buffer
and then add the text of the tokens that are in the output to that output
buffer, adding blank lines for lines that don't make it.

It looks to me like you can do that with a lexer only grammar:

int switch = 0; // Inc and decrement, output only when zero


IFDEF: '#ifdef' EXPR { process EXPR, inc/dec etc add a newline}

...

NL: '\n' { add a newline to preserve line numbers }|

ANY : . {add this character if output is on} ;


Should be fairly trivial unless your command set is larger than you show
here. Also, have you thought of just coding it in m4? I doubt that using
streams is much slower than in memory anyway except for huge numbers of
input files.


Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Justin Murray
> Sent: Thursday, August 30, 2012 1:27 PM
> To: antlr-interest at antlr.org
> Cc: "Robert Jacobs"@www.antlr.org
> Subject: [antlr-interest] Pre-processor advice [C target]
>
> Hello all,
>
>
>
> We have a DSL at my company, for which we have our own compiler written
> in C/C++. It is very old, monstrous, and terribly written. A little
> over a year ago, I successfully replaced the lexer and parser with an
> ANTLR implementation, and now I am tasked with replacing the
> preprocessor. I am writing to ask for some general advice on the best
> approach for this.
>
>
>
> The current process is such that we read the source file from disk into
> a memory buffer. The preprocessor works on this buffer, doing text
> transformations as necessary. This string is then passed into
> antlr3StringStreamNew(), and the ANTLR lexer and parser take over from
> there, ultimately executing the semantic actions that produce our
> binary object code. Ideally, the preprocessor would be a drop-in
> replacement in this process.
>
>
>
> The set of preprocessor commands is relatively short, and fairly
> typical:
>
> #include, #define, #undef,  #ifdef, #else, #elseif, #endif, #nosubst,
> #subst (these last 2 basically just switch the #define substitution off
> and on for a block of code)
>
>
>
> There are a few requirements that complicate this a bit:
>
> 1.       The original line numbers must be preserved for later stages
> (for error messages, and status at runtime), even after multi-line
> macro substitutions
>
> 2.       The rules for #define substitution are very complex. The
> allowed identifier for the macro name can contain any symbols, except
> for white space. The crazy thing is though, when searching the code
> text for possible substitutions, non-alphanumeric symbols are treated
> as both delimiters and not. The current algorithm is to identify tokens
> using white-space as a true delimiter, then identify all possible sub-
> tokens based on these partial delimiters. Each candidate sub-token is
> looked up in the table of defines, and if there is a match, the text is
> substituted. It does these largest to smallest, moving on once a
> substitution is found, or all possible tokens were tried. I suspect
> that I will still be doing this sub-token parsing and substitution by
> hand, since I don't think ANTLR supports overlapping tokens like these
> (but I would love to hear if someone has done something like this).
>
> 3.       Add support for function-like macros (text substitution with
> arguments).
>
>
>
> I have spent some time searching the mailing lists and re-reading the
> ANTLR book, where I found some hints, but no clear-cut solution to my
> problems. String templates and TokenRewriteStream look the most
> promising, but as far as I can tell the TokenRewriteStream has not been
> implemented in the C target runtime. Can anyone suggest what options
> might be available to me, given these requirements?
>
>
>
> Thank you!
>
>
>
> - Justin Murray
>
>
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list