[antlr-interest] Repeatedly parsing number literals

Rick Mann rmann at latencyzero.com
Sat Mar 28 22:43:08 PDT 2009


On Mar 28, 2009, at 22:36:31, Gavin Lambert wrote:

> At 15:44 29/03/2009, Rick Mann wrote:
> >DecimalLiteral
> >	: '0'..'9' '0'..'9'* { $value = };
> >
> >FloatingPointLiteral
> >	:	('0'..'9')+ '.' ('0'..'9')* Exponent?
> >	|	('0'..'9')+ Exponent
> >	|	('0'..'9')+
> >	;
>
> Note that these rules are lexically ambiguous -- the final alt of  
> FloatingPointLiteral is indistinguishable from DecimalLiteral, and  
> all of the alternatives share a common left prefix.  This is going  
> to get you into trouble.
>
> You should rewrite these two rules into a single lexer rule and left- 
> factor the common prefix away.

Well, you would think that this is true, but it turns out not to be. I  
lifted those rules from Terrence's Java grammar. Sure enough, it works  
as expected, to the degree that if a parser calls for a float literal,  
and I give it a literal that would match DecimalLiteral, it complains.

> >And a number of parser rules that refer to them. Do I need
> >to write actions like this:
> >
> >$value = Integer.parseInt($DecimalLiteral.text);
>
> Yes.  The only return from a lexer rule is the token.
>
> Having said that, you *can* add custom data to a token (exactly how  
> you do that depends on your target language; Java requires  
> subclassing the token, for example), so it's not completely  
> impossible to deal with it at lexing time; but it's usually not  
> worth the hassle.

Thanks. I ended up making parser rules "decNum" and "floatNum". I hope  
that's kosher.


-- 
Rick



More information about the antlr-interest mailing list