[antlr-interest] newbie problem about expressions & number representations
Gavin Lambert
antlr at mirality.co.nz
Sun Dec 13 01:36:02 PST 2009
At 13:37 12/12/2009, David-Sarah Hopwood wrote:
>I thought lexer rules were supposed to find the longest match?
>How can they do that if they're unable to handle common left
>prefixes?
>
>(I have the impression that "longest match" may not be quite
>accurate, but if so, I've never seen the actual behaviour
>documented precisely.)
In v3, for fixed-length input (eg. keywords), usually the longest
match wins, yes. (Though as Jim said, the order of rules also
plays a role.) When loops are involved things get a bit more
murky. I've heard mixed stories on how well it copes with that --
I think it might depend on whether it decides to generate a DFA or
stick with lookahead conditions. Even if it does manage to make
something that'll work, it'll certainly cause extra processing
both at compile and runtime, though, so it's definitely something
to be avoided.
In v2, it's impossible to deal with. v2 lexers operate with
completely fixed lookahead; for example, if k=3 then it'll look
ahead, see "123" and find that this matches both rules; it can't
look any further ahead to disambiguate. So it'll correctly
produce a FLOAT if the prefix prior to the decimal point or
exponent marker is zero to (k-1) digits long, but any more than
that and it'll probably make an INT, since that rule was listed
first. (And you always want k to be minimal, to reduce overhead
and improve performance. So increasing k is not the right
answer. Refactoring the rules is.)
More information about the antlr-interest
mailing list