[antlr-interest] Repeatedly parsing number literals

Gavin Lambert antlr at mirality.co.nz
Sat Mar 28 22:53:29 PDT 2009


At 18:43 29/03/2009, Rick Mann wrote:
 >Well, you would think that this is true, but it turns out not to 

 >be. I lifted those rules from Terrence's Java grammar. Sure
 >enough, it works as expected, to the degree that if a parser
 >calls for a float literal, and I give it a literal that would
 >match DecimalLiteral, it complains.

The parser has no influence over the lexer, so what the parser 
calls for is irrelevant -- for a given bit of input text you will 
always get the same token regardless of parser context.

In theory, if you throw the input "1234" at the rules you posted, 
it should always end up being a DecimalLiteral (and you should 
have gotten an "unreachable alternative" warning for 
FloatLiteral).  This is because when two rules can be statically 
determined to match the same input then ANTLR will normally pick 
the rule that was listed first.  If you give it input such as 
"1234.5" then you'll get a FloatLiteral; if you give it 
"1234.abcd" or "1234..1238" then you'll get a runtime error within 
the bowels of FloatLiteral, even if you have dot or double-dot 
tokens available.

Performance will definitely be improved if you combine and 
left-factor, though -- that'll reduce it to k=1 decisions, if done 
right.



More information about the antlr-interest mailing list