[antlr-interest] Repeatedly parsing number literals
Rick Mann
rmann at latencyzero.com
Sat Mar 28 22:43:08 PDT 2009
On Mar 28, 2009, at 22:36:31, Gavin Lambert wrote:
> At 15:44 29/03/2009, Rick Mann wrote:
> >DecimalLiteral
> > : '0'..'9' '0'..'9'* { $value = };
> >
> >FloatingPointLiteral
> > : ('0'..'9')+ '.' ('0'..'9')* Exponent?
> > | ('0'..'9')+ Exponent
> > | ('0'..'9')+
> > ;
>
> Note that these rules are lexically ambiguous -- the final alt of
> FloatingPointLiteral is indistinguishable from DecimalLiteral, and
> all of the alternatives share a common left prefix. This is going
> to get you into trouble.
>
> You should rewrite these two rules into a single lexer rule and left-
> factor the common prefix away.
Well, you would think that this is true, but it turns out not to be. I
lifted those rules from Terrence's Java grammar. Sure enough, it works
as expected, to the degree that if a parser calls for a float literal,
and I give it a literal that would match DecimalLiteral, it complains.
> >And a number of parser rules that refer to them. Do I need
> >to write actions like this:
> >
> >$value = Integer.parseInt($DecimalLiteral.text);
>
> Yes. The only return from a lexer rule is the token.
>
> Having said that, you *can* add custom data to a token (exactly how
> you do that depends on your target language; Java requires
> subclassing the token, for example), so it's not completely
> impossible to deal with it at lexing time; but it's usually not
> worth the hassle.
Thanks. I ended up making parser rules "decNum" and "floatNum". I hope
that's kosher.
--
Rick
More information about the antlr-interest
mailing list