[antlr-interest] Problem when parsing numerics

Wed Feb 18 01:02:49 PST 2009

Well.. There is a bug in my explanation. I got confused with a problem I was facing. Your problem is actually simpler. Here goes the explanation.

Lexer sees "1." and since lexer always forms tokens with max string length possible it forms a token NUMERIC with the string "1." and this comes to your parser. But what your parser is expacting is NUMERIC followed by a '.'. So parsing fails. Simple.

- Indhu

----- Original Message -----
From: Indhu Bharathi <indhu.b at s7software.com>
To: Thomas Woelfle <thomas.woelfle at interactive-objects.com>
Cc: antlr-interest at antlr.org, jimi at temporal-wave.com
Sent: Wednesday, February 18, 2009 2:27:19 PM GMT+0530 Asia/Calcutta
Subject: Re: [antlr-interest] Problem when parsing numerics

The following grammar will fix your problem.

-------------------------------------------
grammar Test;

options {language=Java;}

foo     :     numeric DOT;

numeric :    NUMBER (DOT NUMBER)?;

NUMBER	:	'0'..'9'+
	;

DOT	:	'.'
	;

--------------------------------------------

I dont know the exact reason why this occours. Will try my best to explain.  

The lexer will always try to form a token with maximum string length possible. In this case (1.) , on seeing a '.' the lexer 'predicts' it to be '0'..'9'+ ('.' '0'..'9'+) assuming the second part (under ?) is present and runs the DFA and the DFA crashes. I guess it is generally not a good idea to have two rules R1 and R2 in lexer where R1 starts with R2.

Will be good if someone can add more clarity to the explanation.

- Indhu

----- Original Message -----
From: Thomas Woelfle <thomas.woelfle at interactive-objects.com>
To: jimi at temporal-wave.com
Cc: antlr-interest at antlr.org
Sent: Wednesday, February 18, 2009 1:21:02 PM GMT+0530 Asia/Calcutta
Subject: Re: [antlr-interest] Problem when parsing numerics

Hi Jim,

thanks for the tip. This lexer grammar for floating points is quite 
impressive and answers some interesting questions to me. But it did not 
solve my problem. Using the toekn rule 'FLOATING_POINT_LITERAL' in my 
grammar results in the same MismatchedTokenException. My adjusted 
grammar is:

foo     :     FLOATING_POINT_LITERAL '.';

where "FLOATING_POINT_LITERAL" is the rule from your example.

Parsing the input string "1.5." results in a MismatchedTokenException. 
Any idea what is going wrong?

Regards,
Thomas
> Thomas Woelfle wrote:
>   
>> Hi,
>>
>> I've been running into a strange problem using ANTLR 3.1.1. I don't know 
>> wether it is a bug in my grammar or a bug in ANTLR.
>> In the language that has to be parsed following lines are legal strings:
>>
>> 1.
>> 1.5.
>>
>> There is a rule where a numeric is followed by a dot.
>>   
>>     
> Please look in the FAQ/examples:
>
> http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%2C+dot%2C+range%2C+time+specs
>
> You should be able to simplify the grammar here to just what you need.
>
> Jim
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>   

-- 
Interactive Objects Software GmbH
Basler Strasse 61
79100 Freiburg, Germany

Phone:  +49 761 400 73 0
mailto:thomas.woelfle at interactive-objects.com

------------------------------------------------------------------------

Interactive Objects' Legacy Modernization Solutions 

Get Your Applications SOA-Ready!

See http://www.interactive-objects.com/ for more information.

------------------------------------------------------------------------

Interactive Objects Software GmbH | Freiburg | Geschäftsführer: Alberto Perandones, Andrea Hemprich
| AG Frbg. HRB 5810 | USt-ID: DE 197983057

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address