[antlr-interest] Unified grammar and # directives for a C-like language

Wed May 14 13:27:42 PDT 2003

Hi,

(First message here! Hello antlr community!) 

I am using antlr to build a parser for a C-like language. The output 
of the parser is fed to a C compiler, so in that regard the parser is 
a sort of preprocessor (a fairly intricate one at that).

I want to take care of parsing the # directives (include, define
etc.) 
within one unified grammar, as opposed to writing multiple
translation 
stages. 

So I need to detect # at the beginning of line, barring whitespace, 
and make it a HASH_DIRECTIVE_START token. Also, # must be recognized 
at the very beginning of a file.

So I came up with:

// Newlines -- ignored, but bump the line number 
NEWLINE 
    :
    (
        options { generateAmbigWarnings=false; }
        : "\r\n"  // Evil DOS
        | '\r'    // Macintosh
        | '\n'    // Unix (the right way)
    )
    { $setType(antlr::Token::SKIP); newline(); }
    (  
        options { generateAmbigWarnings=false; } : ( WS )? '#' 
        { 
            $setType(HASH_DIRECTIVE_BEGIN); 
        }
    )?
    ;

HASH_DIRECTIVE_BEGIN : ;

As you see, this takes care of any # that comes after some newline, 
and also lets you insert some whitespace before the '#' sign. 

However, damn! The rule above is unable to recognize a '#' appearing 
right at the beginning of file!

I tried a number of approaches, and I can make things work, but I 
couldn't find the "obviously elegant" solution. I am sure there is 
one. Could anyone enlighten me? Thank you!

Andrei

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/