[antlr-interest] Relaxed number format lexer problem

Thomas Brandon tbrandonau at gmail.com
Wed Apr 22 11:24:01 PDT 2009


On Thu, Apr 23, 2009 at 2:40 AM, Sam Harwell
<sharwell at pixelminegames.com> wrote:
> I’m using a heavily relaxed NUMBER token in my lexer so I can provide better
> error messages. The problem I’m having occurs when input such as 1..2 is
> reached. The '..' should be an OP_RANGE token, not part of the number.
>
>
>
> NUMBER
>
> @after
>
> {
>
> ClassifyNumber($text);
>
> }
>
>         :       '.'? '0'..'9'
>
>                 (       // <-- It throws a NoViableAltException at this
> decision when next the input is '..'
>
>                         (       '0'..'9'
>
>                         |       'a'..'d'
>
>                         |       'f'..'z'
>
>                         |       'A'..'D'
>
>                         |       'F'..'Z'
>
>                         |       ('.' ~'.') => '.'
>
>                         )
>
>                 |       ('e' | 'E')
>
>                         (       ('+'|'-') => ('+' | '-')?
>
>                                 '0'..'9'
>
>                         )?
>
>                 )*
>
>         ;
>
I think the problem is that as there is no syntactic ambiguity ANTLR
won't hoist your "('.' ~'.') =>" into the choice between the  "(
'0'..'9' | 'a'..'d' | 'f'..'z' | 'A'..'D' | 'F'..'Z' | ('.' ~'.') =>
'.'  )" block and the exponent block. Hence on seeing a '.' ANTLR
enters the first alternate and then as something must match in the
required sub-block here ANTLR throws a NoViableAlt when it doesn't.

Changing the predicate to a gated semantic predicate ("{ input.LA(2)
!= '.'}=>") forces it to be hoisted and resolves the problem.

Alternately, removing the sub-block so you have:
NUMBER
@after
{
ClassifyNumber($text);
}
        :       '.'? '0'..'9'
                (       '0'..'9'
                |       'a'..'d'
                |       'f'..'z'
                |       'A'..'D'
                |       'F'..'Z'
                |       ( '.' ~'.' )=> '.'
                |       ('e' | 'E')
                        (       ('+'|'-') => ('+' | '-')?
                                '0'..'9'
                        )?
                )*
        ;
also fixes it without introducing a semantic predicate which you
didn't want. Though now you couldn't have actions on the non-exponent
block if that was your intent for the sub-block.

I don't think this is an error, it's just that syntactic predicates
aren't hoisted. Gated syntactic predicates would solve this. In this
case the semantic predicate is simple enough and it is only your
desire to not use one that goes against it but in other cases
converting to a semantic predicate could be trickier. Though I can't
see there being that many cases where you'd want to syntactically
disambiguate a (according to ANTLR) non-syntactically ambiguous
decision.

Tom.
>
>
> Thank you,
>
> Sam Harwell
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>


More information about the antlr-interest mailing list