[antlr-interest] Lexer problem: distinguish between TIC and CHAR_LITERAL

Martin Probst mail at martin-probst.com
Fri Sep 16 01:19:07 PDT 2005


Hi,

> thanks for your answer.
> The concerning language is Ada95. There must be a way to correctly identify 
> the tics/character literals, or else how would the Ada compilers work?

They probably have a stateful lexer. Sometimes these differences come
just because they are using a different parsing strategy, e.g. an LALR
parser (yacc/bison).

> Is the following possible with AntLR?
> If the Lexer finds a tic, it checks if the previews token was an identifier 
> token. If so, it must be a TIC token, it can't be the beginning of a 
> character literal.
> I have not the slightest idea how to implement that :(

That's possible. Just store a boolean flag in your Lexer at the
beginning:
{
  boolean afterIdentifier = false;
}

And then set the flag to true/false according to the last token, e.g.
ID: ... { afterIdentifier = true; };
And have your TIC rule only fire if it's true:
TIC: { afterIdentifier }? ...;
Don't forget to set the flag off after all the other tokens.

For stateful Lexers with ANTLR it's usually nicer to have all rules of
the Lexer being protected and then have one big NEXT rule that looks
like this:

NEXT:
  { state == FOO }? ATOKEN { $setType(ATOKEN)
  | {state == bar }? BTOKEN { $setType(BTOKEN) }
  ..
but that might be overkill for you.

Martin



More information about the antlr-interest mailing list