[antlr-interest] Unified grammar and # directives for a C-lik e language

mzukowski at yci.com mzukowski at yci.com
Wed May 14 13:37:17 PDT 2003


You can now have predicates hoisted into nextToken for non-protected lexer
rules.  This means you can write something like this:

HASH_DIRECTIVE_BEGIN: {getColumn()==1}? '#' ;

See the docs at http://www.antlr.org/doc/lexer.html#Predicated-LL(k)_Lexing

Then you can get that test for it out of the NEWLINE method.  Look at the
generated code, especially the nextToken() method, to see what antlr is
doing for you.

Monty

-----Original Message-----
From: uprightness_of_character [mailto:andrei at metalanguage.com]
Sent: Wednesday, May 14, 2003 1:28 PM
To: antlr-interest at yahoogroups.com
Subject: [antlr-interest] Unified grammar and # directives for a C-like
language


Hi,


(First message here! Hello antlr community!) 

I am using antlr to build a parser for a C-like language. The output 
of the parser is fed to a C compiler, so in that regard the parser is 
a sort of preprocessor (a fairly intricate one at that).

I want to take care of parsing the # directives (include, define
etc.) 
within one unified grammar, as opposed to writing multiple
translation 
stages. 

So I need to detect # at the beginning of line, barring whitespace, 
and make it a HASH_DIRECTIVE_START token. Also, # must be recognized 
at the very beginning of a file.

So I came up with:

// Newlines -- ignored, but bump the line number 
NEWLINE 
    :
    (
        options { generateAmbigWarnings=false; }
        : "\r\n"  // Evil DOS
        | '\r'    // Macintosh
        | '\n'    // Unix (the right way)
    )
    { $setType(antlr::Token::SKIP); newline(); }
    (  
        options { generateAmbigWarnings=false; } : ( WS )? '#' 
        { 
            $setType(HASH_DIRECTIVE_BEGIN); 
        }
    )?
    ;

HASH_DIRECTIVE_BEGIN : ;
        
As you see, this takes care of any # that comes after some newline, 
and also lets you insert some whitespace before the '#' sign. 

However, damn! The rule above is unable to recognize a '#' appearing 
right at the beginning of file!

I tried a number of approaches, and I can make things work, but I 
couldn't find the "obviously elegant" solution. I am sure there is 
one. Could anyone enlighten me? Thank you!


Andrei


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list