[antlr-interest] Unified grammar and # directives for a C-like language
uprightness_of_character
andrei at metalanguage.com
Wed May 14 13:27:42 PDT 2003
Hi,
(First message here! Hello antlr community!)
I am using antlr to build a parser for a C-like language. The output
of the parser is fed to a C compiler, so in that regard the parser is
a sort of preprocessor (a fairly intricate one at that).
I want to take care of parsing the # directives (include, define
etc.)
within one unified grammar, as opposed to writing multiple
translation
stages.
So I need to detect # at the beginning of line, barring whitespace,
and make it a HASH_DIRECTIVE_START token. Also, # must be recognized
at the very beginning of a file.
So I came up with:
// Newlines -- ignored, but bump the line number
NEWLINE
:
(
options { generateAmbigWarnings=false; }
: "\r\n" // Evil DOS
| '\r' // Macintosh
| '\n' // Unix (the right way)
)
{ $setType(antlr::Token::SKIP); newline(); }
(
options { generateAmbigWarnings=false; } : ( WS )? '#'
{
$setType(HASH_DIRECTIVE_BEGIN);
}
)?
;
HASH_DIRECTIVE_BEGIN : ;
As you see, this takes care of any # that comes after some newline,
and also lets you insert some whitespace before the '#' sign.
However, damn! The rule above is unable to recognize a '#' appearing
right at the beginning of file!
I tried a number of approaches, and I can make things work, but I
couldn't find the "obviously elegant" solution. I am sure there is
one. Could anyone enlighten me? Thank you!
Andrei
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list