[antlr-interest] Integer literal ending problem

Shevchenko A ashe at 1c.ru
Tue Dec 13 21:43:13 PST 2011


Hello Justin,

Thanks for response.
For languages like SQL, the string "SELECT 123 A ..." is valid and "SELECT
123A ..." is not.

The deeper problem is that I want to be able to differentiate integer
literal from decimal literal at the lexical parser layer (i.e., "123.456" is
decimal/numeric literal and "123" is integer literal).

So, the more complicated grammar would be
INTEGER_LITERAL: ('0'..'9')+;
DECIMAL_LITERAL: ('0'..'9')+ ('.' ('0'..'9')*)?;

I agree that specifying whitespace in the rule definition is not a good
idea.
But specifying all set of symbols invalid at the end is the bad idea too.

--
Regards,
Anton Shevchenko,
1C Company, Moscow.

--
Hello Anton,

Why are two tokens a problem in that case? That is exactly what your lexer
grammar dictates. If you want "123A" to error, make it error in the parsing
stage (not during lexing) by simply making sure that you don't have a rule
like:

myrule: INT ID;

If for some reason "123A" should be invalid, but "123 A" is ok, then you
will need to use whitespace as part of your grammar:

myrule: INT WS ID;

This is not typically how most languages work though, it is better if
whitespace can be ignored. Usually, some other delimiter should come between
an INT and an ID, such as an operator or a comma.

- Justin

On 12/13/2011 6:46 AM, Shevchenko A wrote:
> Hello,
>
> I am trying to write some tests for the lexical parser generated with
ANTLR.
> My grammar is simple:
> INT: ('0'..'9')+;
> ID: ('A'..'Z') ('A'..'Z' | '0'..'9')* ;
> WS: (' ' | '\r' | '\n')* { skip(); };
>
> With such a grammar the parser will interpret the string "123A" as 2 
> tokens, and this is undesirable.
> If I specify that integer should be ended with whitespace another 
> problem will come up. Not only whitespace is the ending but also all 
> special characters.
>
> So, the question is about best practices to solve the problem.
> Thanks in advance.
>
> --
> Regards,
> Anton Shevchenko,
> 1C Company, Moscow.




More information about the antlr-interest mailing list