[antlr-interest] Lexing problem - range of ints vs float

Tue Sep 11 09:01:48 PDT 2012

Your rules for FLOAT are more restrictive than mine, but I had a similar problem not long ago: http://www.antlr.org/pipermail/antlr-interest/2012-August/045132.html

Jim Idle pointed out this FAQ entry (http://tinyurl.com/8t4pnhv), which lead me to the following lexer rules:

// Matching for DOT tokens is handled in FLOAT, below.
// Notionally: DOT: '.';
fragment DOT: ;
// Matching for TO tokens is handled in FLOAT, below.
// Notionally: TO: '..';
fragment TO: ;

fragment DIGIT: '0'..'9';
fragment SIGN: '-'?;

// Matching for INT tokens is handled in FLOAT, below.
// Notionally: INT: SIGN DIGIT+;
fragment INT: ;

fragment FLOAT_EXPONENT
    :   'e' ('+'|'-')? DIGIT+
    ;

// Complex rule tree deciding several logical lexer rules: INT, FLOAT, DOT,
// and TO. See http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%2C+dot%2C+range%2C+time+specs
// for the origin of the idea. This is required because of the ambiguity of
// inputs like "1..3" -- in some contexts, it can be a range; in others, it's
// two floats. We always parse it as a range, which is consistent with the C
// implementation.
FLOAT
    :   SIGN DIGIT+ ( // Leading sign and digits: might be FLOAT, might be INT.
            // Two dots means an INT followed by a TO.
            { self.input.LA(2) != ord('.') }?=> '.' (
                DIGIT* FLOAT_EXPONENT? { $type = FLOAT; }
            )
            | FLOAT_EXPONENT { $type = FLOAT; }
            | { $type = INT; }
        )
        | '-' '.' ( // Leading sign and dot, must be float.
            DIGIT+ FLOAT_EXPONENT? { $type = FLOAT; }
        )
        | '.' ( // Leading dot: might be FLOAT, DOT, or TO.
            '.' { $type = TO; }
            | DIGIT+ FLOAT_EXPONENT? { $type = FLOAT; }
            | { $type = DOT; }
        )
    ;

Hope that helps,

-o

On 2012-09-11, at 11:55 AM, kjam <pohilets at gmail.com> wrote:

> Hi, All!
> 
> I want lexer to interpret input '0..5' as sequence of tokens INT, '..', INT.
> But instead it sees '0.' and tries to continue parsing float and fails. I
> have '..' token implicitly defined in parser grammar and following lexer
> rules for INT and FLOAT:
> 
> INT
>    : ('0d')? '0'..'9'+
>    | '0b' '0'..'1'+
>    | '0c' '0'..'7'+
>    | '0x' HEX_DIGIT+
>    ;
> 
> FLOAT
>    : ('0'..'9')+ '.' ('0'..'9')+ EXPONENT?
>    | ('0'..'9')+ EXPONENT
>    ;
> 
> fragment
> EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
> 
> fragment
> HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
> 
> 
> 
> --
> View this message in context: http://antlr.1301665.n2.nabble.com/Lexing-problem-range-of-ints-vs-float-tp7578705.html
> Sent from the ANTLR mailing list archive at Nabble.com.
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address