[antlr-interest] Problem when parsing numerics

Thomas Woelfle thomas.woelfle at interactive-objects.com
Thu Feb 19 00:54:03 PST 2009


Hi Jim,
> You have to tell it what to do to verify its selection. the '.' tells it 
> to look for 0..9 and that fails. Then you have auto-generated a lexer 
> rule for '.' and made it all ambiguous ;-). Rule number one if you are 
> not yet very familiar with ANTLR is to NOT put 'literals' in your 
> parser. It tempts you to think that the lexer is being driven by the 
> parser, but the lexer runs all the way through the input first.
>
> For your simple rule, you can have:
>
>
> foo : NUMERIC DOT;
>
>
> NUMERIC  : ('0'..'9')+ ( ('.' '0'..'9')=> '.' ('0'..'9')+) ;
> DOT : '.' ;
>
> But that precludes:
>
> 5.
>
> from being a floating point number of course.
>
>
> Jim
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>   
Ok, you're right. Putting literals in a parser rule isn't a good idea. 
But even when factoring out "." into the DOT lexer rule the result is 
the same.
I think my problem is not the parser but the lexer. As far as I know a 
lexer usually tokenizes an input stream always trying to find the 
longest valid token. In my case valid tokens are NUMERICs and DOTs. 
Where a NUMERIC can be "5", "5.5", "123.56".
"5." is not a NUMERIC. Having that input stream I would have expected 
the lexer to tokenize it into two tokens NUMERIC "5" and DOT ".".

I have reduced the grammer to a a lexer grammer containing only the two 
rules for NUMERIC and DOT

lexer grammar Simple;

options {language=Java;}

@header
{
package test;
}

NUMERIC  : ('0'..'9')+ ('.' ('0'..'9')+)? ;
DOT : '.' ;

Then I've tried to print out all tokens for the input '5.' using 
following simple test runner:

package test;

import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;

public class Main
{
    public static void main(final String[] args)
    {
        final Simple lexer = new Simple(new ANTLRStringStream("5."));
        final CommonTokenStream stream = new CommonTokenStream(lexer);
        for(final Object token : stream.getTokens())
        {
            System.out.println(token);
        }
    }
}

The result is the same. It cannot tokenize that string. It works with 
the input strings "5" and "5.5." but not with "5.". Using a syntactic 
predicate as you suggested in your previous mail  resulted in the same 
exception.

Regards,
Thomas

-- 
Interactive Objects Software GmbH
Basler Strasse 61
79100 Freiburg, Germany

Phone:  +49 761 400 73 0
mailto:thomas.woelfle at interactive-objects.com


------------------------------------------------------------------------

Interactive Objects' Legacy Modernization Solutions 

Get Your Applications SOA-Ready!

See http://www.interactive-objects.com/ for more information.

------------------------------------------------------------------------


Interactive Objects Software GmbH | Freiburg | Geschäftsführer: Alberto Perandones, Andrea Hemprich
| AG Frbg. HRB 5810 | USt-ID: DE 197983057



More information about the antlr-interest mailing list