[antlr-interest] Why don't parsers support character ranges?

Wed Apr 23 17:55:34 PDT 2008

On Wednesday 23 April 2008 17:01, Peter Nann wrote:
> Hmmm, I was hoping for more than the 'efficiency' argument...
> I am wondering if that argument is about 10 years past its use-by
> date...
> We are not in the days of single-digit-Megahertz and RAM measured in
> k anymore... when lexx and yacc were written...

Well, ANTLR goes well beyond lex and yacc. However, if you believe that 
lexer / parser stratification is no longer justified, you could set out 
to prove that thesis by writing a unified lexer / parser generator 
tool. (That does everything current tools do!) Many good current parser 
generators are open source (including ANTLR, of course), so you can 
exploit the techniques they use and that you like and replace or 
improve the ones you don't.

Personally, I'm not sure stratifying the lexical and syntactic analysis 
is a bad thing. I've certainly never found it to be a problem, and I've 
written my share of parsers, using lex & yacc (or flex and bison, I 
guess), JavaCC, ANTLR 2.x and 3.x. The only thing I don't care for is 
the use of alphabetic case to distinguish lexical from syntactical 
rules.

> It would depend on the scale of parsing you need to do of course, but
> for small-scale parsing I would question whether CPU and RAM matters
> any more on that task...

You know, there's a reason we don't call them "little languages" any 
more. They are never little and they never were little! And while it's 
legitimate to make a considered choice about trading off, say, 
developer time and execution time, it's not really OK to do something 
slowly when you don't get something in turn for it.

> I will have to take your word about 'combinatorial explosion' for
> some problems, but I thought simple RDP's could pretty much break
> down to one branch (as in: switch statement) per character (or token
> if you tokenize it), which doesn't seem excessive, or combinatorial.

You may still want to produce a DFA, and that can in general yield and 
exponential increase in the number of states. Not stratifying the 
lexical and syntactic layers will exacerbate that problem (I think).

And I don't have any idea about the consequences of unifying lexical 
analysis with syntax analysis in the face of arbitrary or variable 
look-ahead or backtracking.

Lastly, I still think lexical states (as they exist in JavaCC, e.g.) 
would be a good thing. It seems that would be harder to do when the 
lexer is not separated from the parser.

>  - But, yes, that was just my CS101 project!

Interesting. If you got that stuff in CS101, you must have gotten one 
hell of a CS education.

> ...
>
> Sorry to be a sour-puss, but I was quite excited about ANTLR at first
> look, but then got disappointed very quickly, so I'm a bit like a
> child who just broke his favourite toy...  ;-)

Show the ANTLR principals wrong by besting them at their own game. If 
you drop the sour-puss act, they'll probably wish you well, even help 
you, and certainly congratulate you if you succeed.

Randall Schulz