[antlr-interest] Best practice convention for implementing a pre-processor?

Mon Dec 27 17:35:34 PST 2010

Hello all! What ANTLR conventions are available to add pre-processor
functionality? For example,

#if TRUE
IgnoreThisIdentifier
#else
ParseThisIdentifier
#endif

Everything after the # and before the new line needs its own parser and
grammar separate from the language. One approach would be to build a
separate grammar and pipe its results into the language grammar. I shy away
from this because I want to leverage the ANTLR optimization whereby all
tokens point into a single loaded input string. The other approach would be
to use predicates to partition the ANTLR file into two grammars.

grammar test;

options
{
language = CSharp3;
}

@lexer::members {
    bool m_pp = false;
}

public file
:       stmt*;

stmt
: id NEWLINE
| pp;
pp
: POUND WS* pp_name NEWLINE;

pp_name
: PP_ID;

id : ID;

POUND :   {!m_pp}?=>'#' { m_pp = true; } ;
PP_ID   :   {m_pp}?=>('A'..'Z')+ ;
ID   :   {!m_pp}?=>('a'..'z'|'A'..'Z')+ ;
INT :   {!m_pp}?=>'0'..'9'+ ;
NEWLINE :   {m_pp}?=>'\r'? '\n' { m_pp = false; };
WS   :   (' '|'\t'|'\n'|'\r')+ {skip();} ;

The draw back here is that this approach to partitioning is not first class
and so I'm unsure of  the perf implications of adding predicates before all
my rules.

Is there a third approach? Some declarative way to partition lexer and
grammar rules?

Thanks!
Chris