[antlr-interest] lexer: matching float vs int

Tue Sep 9 11:10:06 PDT 2008

Here is the numeric lexer section of a C++ grammar. This method is very
different from Jim's method, as it doesn't use any semantic predicates,
and has no dependency on the target language. I'm curious how it
compares performance-wise. Since there are a lot of rules here, I'll
mention the 3 non-fragment rules are:

INTEGER_CONSTANT

CHARACTER_CONSTANT

FLOATING_CONSTANT

/////////////////////////////////////////////////

/////////////////////////////////////////////////

//

//  LEXER

//

// numbers

fragment

LETTER

        :       'a'..'z'

        |       'A'..'Z'

        ;

fragment

NONDIGIT

        :       '_'

        |       'a'..'z'

        |       'A'..'Z'

        ;

fragment

DIGIT

        :       '0'..'9'

        ;

INTEGER_CONSTANT

        :       DECIMAL_CONSTANT INTEGER_SUFFIX?

        |       OCTAL_CONSTANT INTEGER_SUFFIX?

        |       HEXADECIMAL_CONSTANT INTEGER_SUFFIX?

        |       '\'' C_CHAR_SEQUENCE '\''

        ;

fragment

DECIMAL_CONSTANT

        :       NONZERO_DIGIT DIGIT*

        ;

fragment

OCTAL_CONSTANT

        :       '0' OCTAL_DIGIT*

        ;

fragment

HEXADECIMAL_CONSTANT

        :       '0x' HEXADECIMAL_DIGIT+

        |       '0X' HEXADECIMAL_DIGIT+

        ;

fragment

NONZERO_DIGIT

        :       '1'..'9'

        ;

fragment

OCTAL_DIGIT

        :       '0'..'7'

        ;

fragment

HEXADECIMAL_DIGIT

        :       '0'..'9'

        |       'a'..'f'

        |       'A'..'F'

        ;

fragment

INTEGER_SUFFIX

        :       UNSIGNED_SUFFIX LONG_SUFFIX?

        |       LONG_SUFFIX UNSIGNED_SUFFIX?

        ;

fragment

UNSIGNED_SUFFIX

        :       'u' | 'U'

        ;

fragment

LONG_SUFFIX

        :       'l' | 'L'

        ;

CHARACTER_CONSTANT

        :       'L'? '\'' C_CHAR_SEQUENCE '\''

        ;

fragment

C_CHAR_SEQUENCE

        :       C_CHAR+

        ;

fragment

C_CHAR

        :       ~('\'' | '\\' | '\r' | '\n')

        |       ESCAPE_SEQUENCE

        ;

fragment

ESCAPE_SEQUENCE

        :       SIMPLE_ESCAPE_SEQUENCE

        |       OCTAL_ESCAPE_SEQUENCE

        |       HEXADECIMAL_ESCAPE_SEQUENCE

        ;

fragment

SIMPLE_ESCAPE_SEQUENCE

        :       '\\\''

        |       '\\\"'

        |       '\\?'

        |       '\\\\'

        |       '\\a'

        |       '\\b'

        |       '\\f'

        |       '\\n'

        |       '\\r'

        |       '\\t'

        |       '\\v'

        ;

fragment

OCTAL_ESCAPE_SEQUENCE

        :       '\\' OCTAL_DIGIT (OCTAL_DIGIT OCTAL_DIGIT?)?

        ;

fragment

HEXADECIMAL_ESCAPE_SEQUENCE

        :       '\\x' HEXADECIMAL_DIGIT+

        ;

FLOATING_CONSTANT

        :       FRACTIONAL_CONSTANT EXPONENT_PART? FLOATING_SUFFIX?

        |       DIGIT_SEQUENCE EXPONENT_PART FLOATING_SUFFIX?

        ;

fragment

FRACTIONAL_CONSTANT

        :       DIGIT_SEQUENCE '.' DIGIT_SEQUENCE?

        |       '.' DIGIT_SEQUENCE

        ;

fragment

EXPONENT_PART

        :       'e' SIGN? DIGIT_SEQUENCE

        |       'E' SIGN? DIGIT_SEQUENCE

        ;

fragment

SIGN

        :       '+'

        |       '-'

        ;

fragment

DIGIT_SEQUENCE

        :       DIGIT+

        ;

fragment

FLOATING_SUFFIX

        :       'f'

        |       'l'

        |       'F'

        |       'L'

        ;

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Olya Krachina
Sent: Monday, September 08, 2008 11:01 PM
To: antlr-interest at antlr.org
Subject: [antlr-interest] lexer: matching float vs int

Hello,

I am new to antlr and i seem to be stuck on this.

i need to have 2 datatypes defined: int and float, currently i have them
defined

like this in my .g file:

INT:      ('0'..'9')+; 

FLOAT:    ('0'..'9')*('.')('0'..'9')+ ;

So, this does not work, when it comes across an int i think it tries to
match

the longest string, i.e. float but finds space instead of '.' (since its
an int)

and bails out. 

ps: i know this is more a regexp question, but if someone could help
out, I

would greatly appreciate it.

thanks

List: http://www.antlr.org/mailman/listinfo/antlr-interest

Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-addr
ess

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080909/9d45413c/attachment.html