[antlr-interest] newbie problem about expressions & number representations

Sun Dec 13 01:36:02 PST 2009

At 13:37 12/12/2009, David-Sarah Hopwood wrote:
 >I thought lexer rules were supposed to find the longest match?
 >How can they do that if they're unable to handle common left
 >prefixes?
 >
 >(I have the impression that "longest match" may not be quite
 >accurate, but if so, I've never seen the actual behaviour
 >documented precisely.)

In v3, for fixed-length input (eg. keywords), usually the longest 
match wins, yes.  (Though as Jim said, the order of rules also 
plays a role.)  When loops are involved things get a bit more 
murky.  I've heard mixed stories on how well it copes with that -- 
I think it might depend on whether it decides to generate a DFA or 
stick with lookahead conditions.  Even if it does manage to make 
something that'll work, it'll certainly cause extra processing 
both at compile and runtime, though, so it's definitely something 
to be avoided.

In v2, it's impossible to deal with.  v2 lexers operate with 
completely fixed lookahead; for example, if k=3 then it'll look 
ahead, see "123" and find that this matches both rules; it can't 
look any further ahead to disambiguate.  So it'll correctly 
produce a FLOAT if the prefix prior to the decimal point or 
exponent marker is zero to (k-1) digits long, but any more than 
that and it'll probably make an INT, since that rule was listed 
first.  (And you always want k to be minimal, to reduce overhead 
and improve performance.  So increasing k is not the right 
answer.  Refactoring the rules is.)