[antlr-interest] Lexer bug?

Tue Oct 23 05:26:42 PDT 2007

Jim Idle wrote:
> As Loring stated, the important thing is to distill from such 
> discussion, those serious issues that (though he did not state this 
> explicitly) are founded on sound theory, or developing theory

Well, on that note, the definition of correctness really doesn't need
much theory. At the current point in the input, all lexer rules are
active. When the next character is considered, most of them drop out
of the race. Lexing continues this way until all drop out, and then a
winner is declared - the longest match. Or some other rule, if that's
how it's defined. But the point is that it's a game of Go! - All rules
start, not all finish, then the one with the biggest score wins, and the
process starts over.

The mechanism of how to run all these independent machines efficiently
is where the theory comes in! And in knowing what kinds of patterns
they can match...

Correctness is determined by whether each rule is matched correctly,
and whether the winner is chosen correctly. That's all. Invalid input
should be thrown to a recovery method that is not silent by default.

If you want to know what "can" be done, just look at Flex. It works much
better than the Antlr lexer, judging by what I've seen. But if you meant
what can be done by the Antlr lexer...

> My own leaning, while I understand completely the theoretical basis of 
> all the points made thus far, is only that we should enable the 
> realization of lexers, parsers and tree walkers with what is the 
> actuality of the moment,

Antlr's engine is not the only one available. If it's not as good as
another alternative, people who need to will go there, and that'd be
a shame, because there are a lot of really nice things about Antlr
and the tools around it.

> There are few of us that are writing full blown compilers these days 
> [slight paraphrase from something said by Terence], but many people 
> would like to know how to knock up a parser for something slightly more 
> complicated than x=y newline a=b etc. For such needs, a guide to just 
> getting the thing done efficiently is probably more useful than 
> discussions of just what the difference between LALR, LR, LL, LK, NBA 
> and LXMAKEITUPHERE is. The interest and validity in theory is obvious, 
> but for many the question is "So what do I do to make this work?" is 
> probably more poignant :-)

I'd be keen to see some tips on how to extend the Ruby templates to build
a tree parser. I've fiddled with the templates a bit, but it's not always
clear for example what variables and enumerations are made available to
each template... a catalog would help a lot.

Clifford Heath.