[antlr-interest] How is the floating point literal example from wiki supposed to work?

Tue Jan 3 13:07:16 PST 2012

Thanks Jim,
You have beaten me to it. I was going to reply to my own question after
finding out that I could write parser rules such as rul : Fragment_Rule;

Your feedback is helpful beyond my question though. Could you briefly
outline a parser focused solution please? Are you implying that I should
process token information from parser rules? It is really tricky to
distribute various tasks to lexer and parser.

Cheers
Seref

On Tuesday, January 3, 2012, Jim Idle <jimi at temporal-wave.com> wrote:
> The main rule in the example returns the token type as assigned by $type
> =. So you just refer to those types directly in the parser rules. Fragment
> rules are not returned directly by the lexer but the parser has access to
> the token types.
>
> So:
>
> literals
>  : TIME_LITERAL
>  | DECIMAL_LITERAL
>
> and so on.
>
> However, if you are just trying to restrict things like upper case or non
> zero digits, then you probably don't want to do that directly in the lexer
> anyway as then you will just throw out an ambiguous error to your users,
> such as "Invalid char" whereas if you encode the verification after you
> have captured a fairly loose definition of it, then you can say "The
> identifier '00033343' at line 4, col 55, cannot start with leading zeros"
> and so on. Your users will like you a lot better for that.
>
> Jim
>
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>> bounces at antlr.org] On Behalf Of Seref Arikan
>> Sent: Tuesday, January 03, 2012 9:37 AM
>> To: antlr-interest at antlr.org
>> Subject: [antlr-interest] How is the floating point literal example
>> from wiki supposed to work?
>>
>> Greetings,
>> This example from the wiki seems to handle a use case that has cost me
>> some black hair (some pulled out, some turned grey...) :
>> http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+poi
>> nt,+dot,+range,+time+specs
>>
>> The example uses various fragment rules in the lexer, then uses one
>> rule to work on the contents of input stream, and then it sets $type of
>> the rule to one of the fragment types.
>>
>> This looks like a very generic use case, I have many lexer rules which
>> are supposed to be more constrained version of one big/generic rule.
>> For example, capital letters in English, as a subset of all printable
>> characters in ASCII. The approach in the example changes the token type
>> and sends it to parser.
>>
>> But how on earth is this supposed to be used in the parser? The example
>> clearly implies that this is a method to handle this use case, but I
>> could not find a clean way of doing this in the parser. I've found a
>> way of doing it, which awfully feels like a hack. I'll insert my
>> solution at the end.
>>
>> I've found out that even though the fragment rules are not visible in
>> the parser, the actions in the parser can access their identifiers. If
>> a token arrives with a modified type that belongs to a fragment rule,
>> then the parser fails. So I'm correcting the token's type after I catch
>> it with a parser rule that is supposed to represent the fragment rule
>> from the lexer.
>> Is this a sane solution? Am I missing something obvious here? This must
>> be a very common use case in building parsers, but I can't seem to get
>> the method to handle this.
>>
>> Best regards
>> Seref
>>
>> Ps: this is my horrible solution that does the token type trick. It is
>> a brutally simplified version of the wiki example:
>>
>> grammar TstForNums;
>>
>> expr    :    dot;
>>
>> dot    :    {input.LT(1).getType() == TstForNumsParser.DOT}?
>> {input.LT(1).setType(TstForNumsParser.FLOATING_POINT_LITERAL);}
>> FLOATING_POINT_LITERAL
>>     ;
>>
>> //these would be our types that will be assigned to actual rule
>> fragment    TIME_LITERAL        :   ;
>>
>> fragment    DECIMAL_LITERAL     :   ;
>>
>> fragment    OCTAL_LITERAL       :   ;
>>
>> fragment    HEX_LITERAL         :   ;
>>
>> fragment        DOTDOT                  :       ;
>>
>> fragment        DOT                     :       ;
>>
>> //this is the main rule that does the processing //let's set the type
>> to decimal_literal. This is a very simplified form of the example from
>> the wiki //it only shows how a rule's type can be changed here.
>> FLOATING_POINT_LITERAL
>>     :    Digits {$type = DECIMAL_LITERAL;}
>>     ;
>>
>>
>> fragment
>> Digits
>>     :   ('0'..'9')+
>>     ;
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>> email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>