[antlr-interest] Lexer bug?

Sun Oct 21 17:49:19 PDT 2007

Jim Idle wrote:
> This isn't a bug.

Nonsense. Any lexer that consumes characters that aren't a legal token,
and announces a legal token without flagging an error, has a bug.

Lexer rules don't need to "see" any other lexer rule. They just need to
consume as many characters as make a legal token, then stop, Any other
rule that matches the same input but consumes more characters is then
declared the winner, or one that consumes the same number of characters
but is further up in precedence.

This rule consumes digits and one ".", then stops - and that's not a
legal token.

Your proposed technique is interesting and maybe useful, but doesn't
produce the desired behaviour, as it won't match a trailing real number.
I don't want my lexer to produce one token for three, and then have to
break it apart again later - especially since two of the three tokens
have a non-trivial structure. I could hack it into working, but for now,
I'm just requiring a space before the "..". 

My hand-coded lexers almost never used if/then/else, but character
classification tables and switch/case. This range problem is trivial
with flex, as well.

Clifford Heath.