[antlr-interest] Lexing problem - range of ints vs float
Owen Jacobson
owen.jacobson at grimoire.ca
Tue Sep 11 09:01:48 PDT 2012
Your rules for FLOAT are more restrictive than mine, but I had a similar problem not long ago: http://www.antlr.org/pipermail/antlr-interest/2012-August/045132.html
Jim Idle pointed out this FAQ entry (http://tinyurl.com/8t4pnhv), which lead me to the following lexer rules:
// Matching for DOT tokens is handled in FLOAT, below.
// Notionally: DOT: '.';
fragment DOT: ;
// Matching for TO tokens is handled in FLOAT, below.
// Notionally: TO: '..';
fragment TO: ;
fragment DIGIT: '0'..'9';
fragment SIGN: '-'?;
// Matching for INT tokens is handled in FLOAT, below.
// Notionally: INT: SIGN DIGIT+;
fragment INT: ;
fragment FLOAT_EXPONENT
: 'e' ('+'|'-')? DIGIT+
;
// Complex rule tree deciding several logical lexer rules: INT, FLOAT, DOT,
// and TO. See http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%2C+dot%2C+range%2C+time+specs
// for the origin of the idea. This is required because of the ambiguity of
// inputs like "1..3" -- in some contexts, it can be a range; in others, it's
// two floats. We always parse it as a range, which is consistent with the C
// implementation.
FLOAT
: SIGN DIGIT+ ( // Leading sign and digits: might be FLOAT, might be INT.
// Two dots means an INT followed by a TO.
{ self.input.LA(2) != ord('.') }?=> '.' (
DIGIT* FLOAT_EXPONENT? { $type = FLOAT; }
)
| FLOAT_EXPONENT { $type = FLOAT; }
| { $type = INT; }
)
| '-' '.' ( // Leading sign and dot, must be float.
DIGIT+ FLOAT_EXPONENT? { $type = FLOAT; }
)
| '.' ( // Leading dot: might be FLOAT, DOT, or TO.
'.' { $type = TO; }
| DIGIT+ FLOAT_EXPONENT? { $type = FLOAT; }
| { $type = DOT; }
)
;
Hope that helps,
-o
On 2012-09-11, at 11:55 AM, kjam <pohilets at gmail.com> wrote:
> Hi, All!
>
> I want lexer to interpret input '0..5' as sequence of tokens INT, '..', INT.
> But instead it sees '0.' and tries to continue parsing float and fails. I
> have '..' token implicitly defined in parser grammar and following lexer
> rules for INT and FLOAT:
>
> INT
> : ('0d')? '0'..'9'+
> | '0b' '0'..'1'+
> | '0c' '0'..'7'+
> | '0x' HEX_DIGIT+
> ;
>
> FLOAT
> : ('0'..'9')+ '.' ('0'..'9')+ EXPONENT?
> | ('0'..'9')+ EXPONENT
> ;
>
> fragment
> EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
>
> fragment
> HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
>
>
>
> --
> View this message in context: http://antlr.1301665.n2.nabble.com/Lexing-problem-range-of-ints-vs-float-tp7578705.html
> Sent from the ANTLR mailing list archive at Nabble.com.
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
More information about the antlr-interest
mailing list