[antlr-interest] How is the floating point literal example from wiki supposed to work?

Tue Jan 3 11:49:59 PST 2012

The main rule in the example returns the token type as assigned by $type
=. So you just refer to those types directly in the parser rules. Fragment
rules are not returned directly by the lexer but the parser has access to
the token types.

So:

literals
  : TIME_LITERAL
  | DECIMAL_LITERAL

and so on.

However, if you are just trying to restrict things like upper case or non
zero digits, then you probably don't want to do that directly in the lexer
anyway as then you will just throw out an ambiguous error to your users,
such as "Invalid char" whereas if you encode the verification after you
have captured a fairly loose definition of it, then you can say "The
identifier '00033343' at line 4, col 55, cannot start with leading zeros"
and so on. Your users will like you a lot better for that.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Seref Arikan
> Sent: Tuesday, January 03, 2012 9:37 AM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] How is the floating point literal example
> from wiki supposed to work?
>
> Greetings,
> This example from the wiki seems to handle a use case that has cost me
> some black hair (some pulled out, some turned grey...) :
> http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+poi
> nt,+dot,+range,+time+specs
>
> The example uses various fragment rules in the lexer, then uses one
> rule to work on the contents of input stream, and then it sets $type of
> the rule to one of the fragment types.
>
> This looks like a very generic use case, I have many lexer rules which
> are supposed to be more constrained version of one big/generic rule.
> For example, capital letters in English, as a subset of all printable
> characters in ASCII. The approach in the example changes the token type
> and sends it to parser.
>
> But how on earth is this supposed to be used in the parser? The example
> clearly implies that this is a method to handle this use case, but I
> could not find a clean way of doing this in the parser. I've found a
> way of doing it, which awfully feels like a hack. I'll insert my
> solution at the end.
>
> I've found out that even though the fragment rules are not visible in
> the parser, the actions in the parser can access their identifiers. If
> a token arrives with a modified type that belongs to a fragment rule,
> then the parser fails. So I'm correcting the token's type after I catch
> it with a parser rule that is supposed to represent the fragment rule
> from the lexer.
> Is this a sane solution? Am I missing something obvious here? This must
> be a very common use case in building parsers, but I can't seem to get
> the method to handle this.
>
> Best regards
> Seref
>
> Ps: this is my horrible solution that does the token type trick. It is
> a brutally simplified version of the wiki example:
>
> grammar TstForNums;
>
> expr    :    dot;
>
> dot    :    {input.LT(1).getType() == TstForNumsParser.DOT}?
> {input.LT(1).setType(TstForNumsParser.FLOATING_POINT_LITERAL);}
> FLOATING_POINT_LITERAL
>     ;
>
> //these would be our types that will be assigned to actual rule
> fragment    TIME_LITERAL        :   ;
>
> fragment    DECIMAL_LITERAL     :   ;
>
> fragment    OCTAL_LITERAL       :   ;
>
> fragment    HEX_LITERAL         :   ;
>
> fragment        DOTDOT                  :       ;
>
> fragment        DOT                     :       ;
>
> //this is the main rule that does the processing //let's set the type
> to decimal_literal. This is a very simplified form of the example from
> the wiki //it only shows how a rule's type can be changed here.
> FLOATING_POINT_LITERAL
>     :    Digits {$type = DECIMAL_LITERAL;}
>     ;
>
>
> fragment
> Digits
>     :   ('0'..'9')+
>     ;
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address