[antlr-interest] How is the floating point literal example from wiki supposed to work?

Tue Jan 3 09:37:23 PST 2012

Greetings,
This example from the wiki seems to handle a use case that has cost me some
black hair (some pulled out, some turned grey...) :
http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point,+dot,+range,+time+specs

The example uses various fragment rules in the lexer, then uses one rule to
work on the contents of input stream, and then it sets $type of the rule to
one of the fragment types.

This looks like a very generic use case, I have many lexer rules which are
supposed to be more constrained version of one big/generic rule. For
example, capital letters in English, as a subset of all printable
characters in ASCII. The approach in the example changes the token type and
sends it to parser.

But how on earth is this supposed to be used in the parser? The example
clearly implies that this is a method to handle this use case, but I could
not find a clean way of doing this in the parser. I've found a way of doing
it, which awfully feels like a hack. I'll insert my solution at the end.

I've found out that even though the fragment rules are not visible in the
parser, the actions in the parser can access their identifiers. If a token
arrives with a modified type that belongs to a fragment rule, then the
parser fails. So I'm correcting the token's type after I catch it with a
parser rule that is supposed to represent the fragment rule from the lexer.
Is this a sane solution? Am I missing something obvious here? This must be
a very common use case in building parsers, but I can't seem to get the
method to handle this.

Best regards
Seref

Ps: this is my horrible solution that does the token type trick. It is a
brutally simplified version of the wiki example:

grammar TstForNums;

expr    :    dot;

dot    :    {input.LT(1).getType() == TstForNumsParser.DOT}?
{input.LT(1).setType(TstForNumsParser.FLOATING_POINT_LITERAL);}
FLOATING_POINT_LITERAL
    ;

//these would be our types that will be assigned to actual rule
fragment    TIME_LITERAL        :   ;

fragment    DECIMAL_LITERAL     :   ;

fragment    OCTAL_LITERAL       :   ;

fragment    HEX_LITERAL         :   ;

fragment        DOTDOT                  :       ;

fragment        DOT                     :       ;

//this is the main rule that does the processing
//let's set the type to decimal_literal. This is a very simplified form of
the example from the wiki
//it only shows how a rule's type can be changed here.
FLOATING_POINT_LITERAL
    :    Digits {$type = DECIMAL_LITERAL;}
    ;

fragment
Digits
    :   ('0'..'9')+
    ;