[antlr-interest] Integer literal ending problem

Tue Dec 13 05:50:05 PST 2011

Hello Anton,

Why are two tokens a problem in that case? That is exactly what your 
lexer grammar dictates. If you want "123A" to error, make it error in 
the parsing stage (not during lexing) by simply making sure that you 
don't have a rule like:

myrule: INT ID;

If for some reason "123A" should be invalid, but "123 A" is ok, then you 
will need to use whitespace as part of your grammar:

myrule: INT WS ID;

This is not typically how most languages work though, it is better if 
whitespace can be ignored. Usually, some other delimiter should come 
between an INT and an ID, such as an operator or a comma.

- Justin

On 12/13/2011 6:46 AM, Shevchenko A wrote:
> Hello,
>
> I am trying to write some tests for the lexical parser generated with ANTLR.
> My grammar is simple:
> INT: ('0'..'9')+;
> ID: ('A'..'Z') ('A'..'Z' | '0'..'9')* ;
> WS: (' ' | '\r' | '\n')* { skip(); };
>
> With such a grammar the parser will interpret the string "123A" as 2 tokens,
> and this is undesirable.
> If I specify that integer should be ended with whitespace another problem
> will come up. Not only whitespace is the ending but also all special
> characters.
>
> So, the question is about best practices to solve the problem.
> Thanks in advance.
>
> --
> Regards,
> Anton Shevchenko,
> 1C Company, Moscow.
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address