[antlr-interest] Problem when parsing numerics
Thomas Woelfle
thomas.woelfle at interactive-objects.com
Thu Feb 19 00:54:03 PST 2009
Hi Jim,
> You have to tell it what to do to verify its selection. the '.' tells it
> to look for 0..9 and that fails. Then you have auto-generated a lexer
> rule for '.' and made it all ambiguous ;-). Rule number one if you are
> not yet very familiar with ANTLR is to NOT put 'literals' in your
> parser. It tempts you to think that the lexer is being driven by the
> parser, but the lexer runs all the way through the input first.
>
> For your simple rule, you can have:
>
>
> foo : NUMERIC DOT;
>
>
> NUMERIC : ('0'..'9')+ ( ('.' '0'..'9')=> '.' ('0'..'9')+) ;
> DOT : '.' ;
>
> But that precludes:
>
> 5.
>
> from being a floating point number of course.
>
>
> Jim
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
Ok, you're right. Putting literals in a parser rule isn't a good idea.
But even when factoring out "." into the DOT lexer rule the result is
the same.
I think my problem is not the parser but the lexer. As far as I know a
lexer usually tokenizes an input stream always trying to find the
longest valid token. In my case valid tokens are NUMERICs and DOTs.
Where a NUMERIC can be "5", "5.5", "123.56".
"5." is not a NUMERIC. Having that input stream I would have expected
the lexer to tokenize it into two tokens NUMERIC "5" and DOT ".".
I have reduced the grammer to a a lexer grammer containing only the two
rules for NUMERIC and DOT
lexer grammar Simple;
options {language=Java;}
@header
{
package test;
}
NUMERIC : ('0'..'9')+ ('.' ('0'..'9')+)? ;
DOT : '.' ;
Then I've tried to print out all tokens for the input '5.' using
following simple test runner:
package test;
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
public class Main
{
public static void main(final String[] args)
{
final Simple lexer = new Simple(new ANTLRStringStream("5."));
final CommonTokenStream stream = new CommonTokenStream(lexer);
for(final Object token : stream.getTokens())
{
System.out.println(token);
}
}
}
The result is the same. It cannot tokenize that string. It works with
the input strings "5" and "5.5." but not with "5.". Using a syntactic
predicate as you suggested in your previous mail resulted in the same
exception.
Regards,
Thomas
--
Interactive Objects Software GmbH
Basler Strasse 61
79100 Freiburg, Germany
Phone: +49 761 400 73 0
mailto:thomas.woelfle at interactive-objects.com
------------------------------------------------------------------------
Interactive Objects' Legacy Modernization Solutions
Get Your Applications SOA-Ready!
See http://www.interactive-objects.com/ for more information.
------------------------------------------------------------------------
Interactive Objects Software GmbH | Freiburg | Geschäftsführer: Alberto Perandones, Andrea Hemprich
| AG Frbg. HRB 5810 | USt-ID: DE 197983057
More information about the antlr-interest
mailing list