[antlr-interest] lexer: matching float vs int
Sam Harwell
sharwell at pixelminegames.com
Tue Sep 9 11:10:06 PDT 2008
Here is the numeric lexer section of a C++ grammar. This method is very
different from Jim's method, as it doesn't use any semantic predicates,
and has no dependency on the target language. I'm curious how it
compares performance-wise. Since there are a lot of rules here, I'll
mention the 3 non-fragment rules are:
INTEGER_CONSTANT
CHARACTER_CONSTANT
FLOATING_CONSTANT
/////////////////////////////////////////////////
/////////////////////////////////////////////////
//
// LEXER
//
// numbers
fragment
LETTER
: 'a'..'z'
| 'A'..'Z'
;
fragment
NONDIGIT
: '_'
| 'a'..'z'
| 'A'..'Z'
;
fragment
DIGIT
: '0'..'9'
;
INTEGER_CONSTANT
: DECIMAL_CONSTANT INTEGER_SUFFIX?
| OCTAL_CONSTANT INTEGER_SUFFIX?
| HEXADECIMAL_CONSTANT INTEGER_SUFFIX?
| '\'' C_CHAR_SEQUENCE '\''
;
fragment
DECIMAL_CONSTANT
: NONZERO_DIGIT DIGIT*
;
fragment
OCTAL_CONSTANT
: '0' OCTAL_DIGIT*
;
fragment
HEXADECIMAL_CONSTANT
: '0x' HEXADECIMAL_DIGIT+
| '0X' HEXADECIMAL_DIGIT+
;
fragment
NONZERO_DIGIT
: '1'..'9'
;
fragment
OCTAL_DIGIT
: '0'..'7'
;
fragment
HEXADECIMAL_DIGIT
: '0'..'9'
| 'a'..'f'
| 'A'..'F'
;
fragment
INTEGER_SUFFIX
: UNSIGNED_SUFFIX LONG_SUFFIX?
| LONG_SUFFIX UNSIGNED_SUFFIX?
;
fragment
UNSIGNED_SUFFIX
: 'u' | 'U'
;
fragment
LONG_SUFFIX
: 'l' | 'L'
;
CHARACTER_CONSTANT
: 'L'? '\'' C_CHAR_SEQUENCE '\''
;
fragment
C_CHAR_SEQUENCE
: C_CHAR+
;
fragment
C_CHAR
: ~('\'' | '\\' | '\r' | '\n')
| ESCAPE_SEQUENCE
;
fragment
ESCAPE_SEQUENCE
: SIMPLE_ESCAPE_SEQUENCE
| OCTAL_ESCAPE_SEQUENCE
| HEXADECIMAL_ESCAPE_SEQUENCE
;
fragment
SIMPLE_ESCAPE_SEQUENCE
: '\\\''
| '\\\"'
| '\\?'
| '\\\\'
| '\\a'
| '\\b'
| '\\f'
| '\\n'
| '\\r'
| '\\t'
| '\\v'
;
fragment
OCTAL_ESCAPE_SEQUENCE
: '\\' OCTAL_DIGIT (OCTAL_DIGIT OCTAL_DIGIT?)?
;
fragment
HEXADECIMAL_ESCAPE_SEQUENCE
: '\\x' HEXADECIMAL_DIGIT+
;
FLOATING_CONSTANT
: FRACTIONAL_CONSTANT EXPONENT_PART? FLOATING_SUFFIX?
| DIGIT_SEQUENCE EXPONENT_PART FLOATING_SUFFIX?
;
fragment
FRACTIONAL_CONSTANT
: DIGIT_SEQUENCE '.' DIGIT_SEQUENCE?
| '.' DIGIT_SEQUENCE
;
fragment
EXPONENT_PART
: 'e' SIGN? DIGIT_SEQUENCE
| 'E' SIGN? DIGIT_SEQUENCE
;
fragment
SIGN
: '+'
| '-'
;
fragment
DIGIT_SEQUENCE
: DIGIT+
;
fragment
FLOATING_SUFFIX
: 'f'
| 'l'
| 'F'
| 'L'
;
-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Olya Krachina
Sent: Monday, September 08, 2008 11:01 PM
To: antlr-interest at antlr.org
Subject: [antlr-interest] lexer: matching float vs int
Hello,
I am new to antlr and i seem to be stuck on this.
i need to have 2 datatypes defined: int and float, currently i have them
defined
like this in my .g file:
INT: ('0'..'9')+;
FLOAT: ('0'..'9')*('.')('0'..'9')+ ;
So, this does not work, when it comes across an int i think it tries to
match
the longest string, i.e. float but finds space instead of '.' (since its
an int)
and bails out.
ps: i know this is more a regexp question, but if someone could help
out, I
would greatly appreciate it.
thanks
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-addr
ess
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080909/9d45413c/attachment.html
More information about the antlr-interest
mailing list