[antlr-interest] Problem when parsing numerics

Indhu Bharathi indhu.b at s7software.com
Wed Feb 18 01:19:54 PST 2009


Looks like I'm in half sleep today :-) Previous explanation (that the problem is in the lexer) is infact correct. 

Try running a test program which just gets all tokens from the lexer and does no parsing as shown below: 


public class __Test__ { 

public static void main(String args[]) throws Exception { 
TestLexer lex = new TestLexer(new ANTLRFileStream("/root/workspace/Test/output/__Test___input.txt")); 
CommonTokenStream tokens = new CommonTokenStream(lex); 

tokens.getTokens(); 

} 
} 

With the input "1.", I get the error 

line 1:2 required (...)+ loop did not match anything at character '<EOF>' 

- Indhu 


----- Original Message ----- 
From: Indhu Bharathi <indhu.b at s7software.com> 
To: Thomas Woelfle <thomas.woelfle at interactive-objects.com> 
Cc: antlr-interest at antlr.org 
Sent: Wednesday, February 18, 2009 2:32:49 PM GMT+0530 Asia/Calcutta 
Subject: Re: [antlr-interest] Problem when parsing numerics 


Well.. There is a bug in my explanation. I got confused with a problem I was facing. Your problem is actually simpler. Here goes the explanation. 

Lexer sees "1." and since lexer always forms tokens with max string length possible it forms a token NUMERIC with the string "1." and this comes to your parser. But what your parser is expacting is NUMERIC followed by a '.'. So parsing fails. Simple. 

- Indhu 


----- Original Message ----- 
From: Indhu Bharathi <indhu.b at s7software.com> 
To: Thomas Woelfle <thomas.woelfle at interactive-objects.com> 
Cc: antlr-interest at antlr.org, jimi at temporal-wave.com 
Sent: Wednesday, February 18, 2009 2:27:19 PM GMT+0530 Asia/Calcutta 
Subject: Re: [antlr-interest] Problem when parsing numerics 

The following grammar will fix your problem. 

------------------------------------------- 
grammar Test; 

options {language=Java;} 

foo : numeric DOT; 

numeric : NUMBER (DOT NUMBER)?; 

NUMBER : '0'..'9'+ 
; 

DOT : '.' 
; 

-------------------------------------------- 


I dont know the exact reason why this occours. Will try my best to explain. 

The lexer will always try to form a token with maximum string length possible. In this case (1.) , on seeing a '.' the lexer 'predicts' it to be '0'..'9'+ ('.' '0'..'9'+) assuming the second part (under ?) is present and runs the DFA and the DFA crashes. I guess it is generally not a good idea to have two rules R1 and R2 in lexer where R1 starts with R2. 

Will be good if someone can add more clarity to the explanation. 

- Indhu 


----- Original Message ----- 
From: Thomas Woelfle <thomas.woelfle at interactive-objects.com> 
To: jimi at temporal-wave.com 
Cc: antlr-interest at antlr.org 
Sent: Wednesday, February 18, 2009 1:21:02 PM GMT+0530 Asia/Calcutta 
Subject: Re: [antlr-interest] Problem when parsing numerics 

Hi Jim, 

thanks for the tip. This lexer grammar for floating points is quite 
impressive and answers some interesting questions to me. But it did not 
solve my problem. Using the toekn rule 'FLOATING_POINT_LITERAL' in my 
grammar results in the same MismatchedTokenException. My adjusted 
grammar is: 

foo : FLOATING_POINT_LITERAL '.'; 

where "FLOATING_POINT_LITERAL" is the rule from your example. 

Parsing the input string "1.5." results in a MismatchedTokenException. 
Any idea what is going wrong? 

Regards, 
Thomas 
> Thomas Woelfle wrote: 
> 
>> Hi, 
>> 
>> I've been running into a strange problem using ANTLR 3.1.1. I don't know 
>> wether it is a bug in my grammar or a bug in ANTLR. 
>> In the language that has to be parsed following lines are legal strings: 
>> 
>> 1. 
>> 1.5. 
>> 
>> There is a rule where a numeric is followed by a dot. 
>> 
>> 
> Please look in the FAQ/examples: 
> 
> http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%2C+dot%2C+range%2C+time+specs 
> 
> You should be able to simplify the grammar here to just what you need. 
> 
> Jim 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest 
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address 
> 


-- 
Interactive Objects Software GmbH 
Basler Strasse 61 
79100 Freiburg, Germany 

Phone: +49 761 400 73 0 
mailto:thomas.woelfle at interactive-objects.com 


------------------------------------------------------------------------ 

Interactive Objects' Legacy Modernization Solutions 

Get Your Applications SOA-Ready! 

See http://www.interactive-objects.com/ for more information. 

------------------------------------------------------------------------ 


Interactive Objects Software GmbH | Freiburg | Geschäftsführer: Alberto Perandones, Andrea Hemprich 
| AG Frbg. HRB 5810 | USt-ID: DE 197983057 


List: http://www.antlr.org/mailman/listinfo/antlr-interest 
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address 


List: http://www.antlr.org/mailman/listinfo/antlr-interest 
Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090218/13207997/attachment.html 


More information about the antlr-interest mailing list