[antlr-interest] Repeatedly parsing number literals

Gavin Lambert antlr at mirality.co.nz
Sat Mar 28 22:36:31 PDT 2009


At 15:44 29/03/2009, Rick Mann wrote:
 >DecimalLiteral
 >	: '0'..'9' '0'..'9'* { $value = };
 >
 >FloatingPointLiteral
 >	:	('0'..'9')+ '.' ('0'..'9')* Exponent?
 >	|	('0'..'9')+ Exponent
 >	|	('0'..'9')+
 >	;

Note that these rules are lexically ambiguous -- the final alt of 
FloatingPointLiteral is indistinguishable from DecimalLiteral, and 
all of the alternatives share a common left prefix.  This is going 
to get you into trouble.

You should rewrite these two rules into a single lexer rule and 
left-factor the common prefix away.

 >And a number of parser rules that refer to them. Do I need
 >to write actions like this:
 >
 >$value = Integer.parseInt($DecimalLiteral.text);

Yes.  The only return from a lexer rule is the token.

Having said that, you *can* add custom data to a token (exactly 
how you do that depends on your target language; Java requires 
subclassing the token, for example), so it's not completely 
impossible to deal with it at lexing time; but it's usually not 
worth the hassle.



More information about the antlr-interest mailing list