[antlr-interest] Lexer problem: distinguish between TIC and CHAR_LITERAL

Daniel Zuberbuehler dzubi at users.sourceforge.net
Thu Sep 15 10:11:56 PDT 2005


Hi

thanks for your answer.
The concerning language is Ada95. There must be a way to correctly identify 
the tics/character literals, or else how would the Ada compilers work?

Is the following possible with AntLR?
If the Lexer finds a tic, it checks if the previews token was an identifier 
token. If so, it must be a TIC token, it can't be the beginning of a 
character literal.
I have not the slightest idea how to implement that :(

Daniel


On Tuesday 13 September 2005 15:44, you wrote:
> Hi,
>
> > The character literal is defined as two tics with one character in
> > between.
> > To decide if a tic is a TIC or the beginning of a CHARACTER_LITERAL,
> > we check if at LA(3) follows another tic or not.
>
> If this is the definition of your language, then this is (IMHO) just a
> case of the language being ambiguous. How should the parser/lexer tell
> what the user wants? In this case it's quite obvious (one case giving an
> error, another one not), but I think that you might find places where
> it's not that easy.
>
> > new String'('b' & Second_Char);
>
> If there is a certain restriction on where tics may occur and where they
> are disallowed, then you can probably write a stateful lexer working
> around the problem. But the much nicer solution would of course be a
> more sound definition of the lexical structure, avoiding these
> ambiguities.
>
> Martin


More information about the antlr-interest mailing list