[antlr-interest] Problem when parsing numerics

Thomas Woelfle thomas.woelfle at interactive-objects.com
Wed Feb 18 06:43:36 PST 2009


Hi Indhu,

thanks for your reply. You are right. The lexer tries to find the 
longest valid next token. But given my sample grammer and the sample 
input '1.' the first valid token is '1' which is a NUMERIC and then the 
next token is '.'. It is correct that the NUMERIC rule cannot match '1.' 
since that is not a valid NUMERIC token. What it should match is '1' 
which is a valid NUMERIC token.

What I don't understand is why the lexer assumes that if there is a '.' 
after some DIGITs it has to be a NUMERIC.

foo     :     NUMERIC '.';

NUMERIC :    '0'..'9'+ ('.' '0'..'9'+)?;


The NUMERIC rules defines that after the initial DIGITS there may be a 
'.' followed by at least one DIGIT. Therefore the lexer prediction that 
a NUMERIC is the next token if a '.' has been recognized after some 
DIGITS isn't correct, isn't it?

Any ideas?

Regards,
Thomas



>
> Looks like I'm in half sleep today :-) Previous explanation (that the 
> problem is in the lexer) is infact correct.
>
> Try running a test program which just gets all tokens from the lexer 
> and does no parsing as shown below:
>
> public class __Test__ {
>
>     public static void main(String args[]) throws Exception {
>         TestLexer lex = new TestLexer(new 
> ANTLRFileStream("/root/workspace/Test/output/__Test___input.txt"));
>         CommonTokenStream tokens = new CommonTokenStream(lex);
>
>         tokens.getTokens();
>         
>     }
> }
>
> With the input "1.", I get the error
>
> line 1:2 required (...)+ loop did not match anything at character '<EOF>'
>
> - Indhu
>
>
> ----- Original Message -----
> From: Indhu Bharathi <indhu.b at s7software.com>
> To: Thomas Woelfle <thomas.woelfle at interactive-objects.com>
> Cc: antlr-interest at antlr.org
> Sent: Wednesday, February 18, 2009 2:32:49 PM GMT+0530 Asia/Calcutta
> Subject: Re: [antlr-interest] Problem when parsing numerics
>
>
> Well.. There is a bug in my explanation. I got confused with a problem 
> I was facing. Your problem is actually simpler. Here goes the explanation.
>
> Lexer sees "1." and since lexer always forms tokens with max string 
> length possible it forms a token NUMERIC with the string "1." and this 
> comes to your parser. But what your parser is expacting is NUMERIC 
> followed by a '.'. So parsing fails. Simple.
>
> - Indhu
>
>
> ----- Original Message -----
> From: Indhu Bharathi <indhu.b at s7software.com>
> To: Thomas Woelfle <thomas.woelfle at interactive-objects.com>
> Cc: antlr-interest at antlr.org, jimi at temporal-wave.com
> Sent: Wednesday, February 18, 2009 2:27:19 PM GMT+0530 Asia/Calcutta
> Subject: Re: [antlr-interest] Problem when parsing numerics
>
> The following grammar will fix your problem.
>
> -------------------------------------------
> grammar Test;
>
> options {language=Java;}
>
> foo     :     numeric DOT;
>
> numeric :    NUMBER (DOT NUMBER)?;
>
> NUMBER        :        '0'..'9'+
>         ;
>
> DOT        :        '.'
>         ;
>         
> --------------------------------------------
>
>
> I dont know the exact reason why this occours. Will try my best to 
> explain.  
>
> The lexer will always try to form a token with maximum string length 
> possible. In this case (1.) , on seeing a '.' the lexer 'predicts' it 
> to be '0'..'9'+ ('.' '0'..'9'+) assuming the second part (under ?) is 
> present and runs the DFA and the DFA crashes. I guess it is generally 
> not a good idea to have two rules R1 and R2 in lexer where R1 starts 
> with R2.
>
> Will be good if someone can add more clarity to the explanation.
>
> - Indhu
>
>
> ----- Original Message -----
> From: Thomas Woelfle <thomas.woelfle at interactive-objects.com>
> To: jimi at temporal-wave.com
> Cc: antlr-interest at antlr.org
> Sent: Wednesday, February 18, 2009 1:21:02 PM GMT+0530 Asia/Calcutta
> Subject: Re: [antlr-interest] Problem when parsing numerics
>
> Hi Jim,
>
> thanks for the tip. This lexer grammar for floating points is quite
> impressive and answers some interesting questions to me. But it did not
> solve my problem. Using the toekn rule 'FLOATING_POINT_LITERAL' in my
> grammar results in the same MismatchedTokenException. My adjusted
> grammar is:
>
> foo     :     FLOATING_POINT_LITERAL '.';
>
> where "FLOATING_POINT_LITERAL" is the rule from your example.
>
> Parsing the input string "1.5." results in a MismatchedTokenException.
> Any idea what is going wrong?
>
> Regards,
> Thomas
> > Thomas Woelfle wrote:
> >  
> >> Hi,
> >>
> >> I've been running into a strange problem using ANTLR 3.1.1. I don't 
> know
> >> wether it is a bug in my grammar or a bug in ANTLR.
> >> In the language that has to be parsed following lines are legal 
> strings:
> >>
> >> 1.
> >> 1.5.
> >>
> >> There is a rule where a numeric is followed by a dot.
> >>  
> >>    
> > Please look in the FAQ/examples:
> >
> > 
> http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%2C+dot%2C+range%2C+time+specs
> >
> > You should be able to simplify the grammar here to just what you need.
> >
> > Jim
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >  
>
>
> -- 
> Interactive Objects Software GmbH
> Basler Strasse 61
> 79100 Freiburg, Germany
>
> Phone:  +49 761 400 73 0
> mailto:thomas.woelfle at interactive-objects.com
>
>
> ------------------------------------------------------------------------
>
> Interactive Objects' Legacy Modernization Solutions
>
> Get Your Applications SOA-Ready!
>
> See http://www.interactive-objects.com/ for more information.
>
> ------------------------------------------------------------------------
>
>
> Interactive Objects Software GmbH | Freiburg | Geschäftsführer: 
> Alberto Perandones, Andrea Hemprich
> | AG Frbg. HRB 5810 | USt-ID: DE 197983057
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: 
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address


-- 
Interactive Objects Software GmbH
Basler Strasse 61
79100 Freiburg, Germany

Phone:  +49 761 400 73 0
mailto:thomas.woelfle at interactive-objects.com


------------------------------------------------------------------------

Interactive Objects' Legacy Modernization Solutions 

Get Your Applications SOA-Ready!

See http://www.interactive-objects.com/ for more information.

------------------------------------------------------------------------


Interactive Objects Software GmbH | Freiburg | Geschäftsführer: Alberto Perandones, Andrea Hemprich
| AG Frbg. HRB 5810 | USt-ID: DE 197983057



More information about the antlr-interest mailing list