[antlr-interest] Lexer problem: distinguish between TIC and
CHAR_LITERAL
Martin Probst
mail at martin-probst.com
Fri Sep 16 01:19:07 PDT 2005
Hi,
> thanks for your answer.
> The concerning language is Ada95. There must be a way to correctly identify
> the tics/character literals, or else how would the Ada compilers work?
They probably have a stateful lexer. Sometimes these differences come
just because they are using a different parsing strategy, e.g. an LALR
parser (yacc/bison).
> Is the following possible with AntLR?
> If the Lexer finds a tic, it checks if the previews token was an identifier
> token. If so, it must be a TIC token, it can't be the beginning of a
> character literal.
> I have not the slightest idea how to implement that :(
That's possible. Just store a boolean flag in your Lexer at the
beginning:
{
boolean afterIdentifier = false;
}
And then set the flag to true/false according to the last token, e.g.
ID: ... { afterIdentifier = true; };
And have your TIC rule only fire if it's true:
TIC: { afterIdentifier }? ...;
Don't forget to set the flag off after all the other tokens.
For stateful Lexers with ANTLR it's usually nicer to have all rules of
the Lexer being protected and then have one big NEXT rule that looks
like this:
NEXT:
{ state == FOO }? ATOKEN { $setType(ATOKEN)
| {state == bar }? BTOKEN { $setType(BTOKEN) }
..
but that might be overkill for you.
Martin
More information about the antlr-interest
mailing list