[antlr-interest] How to do preprocessing in antlr v4?

Mon Nov 19 11:35:31 PST 2012

Hi,
great, the CHUNK token. I had always trouble when I wanted to ignore (part
of) lines.
The code/extras/CPPBaseLexer.g4 and Co. is worth studying. Programming this
way gives great flexibility and power.

Nevertheless I find it more difficult to work in the lexer than in the
parser. It took me a couple of hours until I obtained what I wanted.
One token too much, as in CHUNK : ~'#'+ '\n' ; and it fails with a # inside
a string, one token less, as in
'#define' ID REPLACE and you get a token recognition error at: '#define '.
Without adding ~'d' in OTHER_CMD, all preprocessor statements were captured
by OTHER_CMD. It gives a feeling of fragility.

Following is the grammar rewritten in "lexer style", a sample input and
execution.

grammar Cmacros_d;

/* Process #define statements in a C file.
   TODO : extract information from DEFINE_PARAM.
*/

program
@init {System.out.println("Cmacros_d last update 2013");}
    :   ( DEFINE_PARAM
               {System.out.print(">>>macro(parameters) " +
$DEFINE_PARAM.text);}
    |     DEFINE_SIMPLE
               {System.out.print(">>>simple macro : " +
$DEFINE_SIMPLE.text);}
    |     OTHER_CMD
    |     CHUNK
        )+
    ;

DEFINE_PARAM
    :   '#define' WS ID '(' WS? ID ( WS? ',' WS? ID )*  WS? ')' REPLACE
    ;

DEFINE_SIMPLE
    :   '#define' WS ID WS REPLACE
    ;

OTHER_CMD
    :   '#' ~'d' ~[\r\n]* '\r'? '\n' ;// can't use .*; scarfs \n\n after
include

WS  :   [ \t]+ -> channel(HIDDEN) ;

CHUNK : ~'#'+ ; // anything else

fragment ID       : ( ID_FIRST (ID_FIRST | DIGIT)* ) ;
fragment DIGIT    : [0-9] ;
fragment ID_FIRST : LETTER | '_' ;
fragment LETTER   : [a-zA-Z] ;
fragment REPLACE  : ~[\r\n]* '\r'? '\n' ;

static char *usage_msg[] = {"-x[directory]   strip off text before #!ruby
line ..."};
#ifndef CharNext
#define CharNext(p) ((p) + mblen(p, RUBY_MBCHAR_MAXSIZE))
#define CharNext    simple replacement
#endif
#define BITSTACK_PUSH(stack, n) (stack = (stack<<1)|((n)&1))

$ grun Cmacros_d program -tokens -diagnostics tcpreproc.c
[@0,0:66='static char *usage_msg[] = {"-x[directory]   strip off text
before ',<5>,1:0]
[@1,67:85='#!ruby line ..."};\n',<3>,1:67]
[@2,86:102='#ifndef CharNext\n',<3>,2:0]
[@3,103:160='#define CharNext(p) ((p) + mblen(p,
RUBY_MBCHAR_MAXSIZE))\n',<1>,3:0]
[@4,161:199='#define CharNext    simple replacement\n',<2>,4:0]
[@5,200:206='#endif\n',<3>,5:0]
[@6,207:267='#define BITSTACK_PUSH(stack, n)\t(stack =
(stack<<1)|((n)&1))\n',<1>,6:0]
[@7,268:267='<EOF>',<-1>,7:61]
Cmacros_d last update 2013
>>>macro(parameters) #define CharNext(p) ((p) + mblen(p,
RUBY_MBCHAR_MAXSIZE))
>>>simple macro : #define CharNext    simple replacement
>>>macro(parameters) #define BITSTACK_PUSH(stack, n) (stack =
(stack<<1)|((n)&1))

2012/11/19 Terence Parr <parrt at cs.usfca.edu>

> Hi. in the extras code dir from book you'll find a C preprocessor like
> sample.
> Ter