[antlr-interest] Why don't parsers support character ranges?
Randall R Schulz
rschulz at sonic.net
Wed Apr 23 17:55:34 PDT 2008
On Wednesday 23 April 2008 17:01, Peter Nann wrote:
> Hmmm, I was hoping for more than the 'efficiency' argument...
> I am wondering if that argument is about 10 years past its use-by
> We are not in the days of single-digit-Megahertz and RAM measured in
> k anymore... when lexx and yacc were written...
Well, ANTLR goes well beyond lex and yacc. However, if you believe that
lexer / parser stratification is no longer justified, you could set out
to prove that thesis by writing a unified lexer / parser generator
tool. (That does everything current tools do!) Many good current parser
generators are open source (including ANTLR, of course), so you can
exploit the techniques they use and that you like and replace or
improve the ones you don't.
Personally, I'm not sure stratifying the lexical and syntactic analysis
is a bad thing. I've certainly never found it to be a problem, and I've
written my share of parsers, using lex & yacc (or flex and bison, I
guess), JavaCC, ANTLR 2.x and 3.x. The only thing I don't care for is
the use of alphabetic case to distinguish lexical from syntactical
> It would depend on the scale of parsing you need to do of course, but
> for small-scale parsing I would question whether CPU and RAM matters
> any more on that task...
You know, there's a reason we don't call them "little languages" any
more. They are never little and they never were little! And while it's
legitimate to make a considered choice about trading off, say,
developer time and execution time, it's not really OK to do something
slowly when you don't get something in turn for it.
> I will have to take your word about 'combinatorial explosion' for
> some problems, but I thought simple RDP's could pretty much break
> down to one branch (as in: switch statement) per character (or token
> if you tokenize it), which doesn't seem excessive, or combinatorial.
You may still want to produce a DFA, and that can in general yield and
exponential increase in the number of states. Not stratifying the
lexical and syntactic layers will exacerbate that problem (I think).
And I don't have any idea about the consequences of unifying lexical
analysis with syntax analysis in the face of arbitrary or variable
look-ahead or backtracking.
Lastly, I still think lexical states (as they exist in JavaCC, e.g.)
would be a good thing. It seems that would be harder to do when the
lexer is not separated from the parser.
> - But, yes, that was just my CS101 project!
Interesting. If you got that stuff in CS101, you must have gotten one
hell of a CS education.
> Sorry to be a sour-puss, but I was quite excited about ANTLR at first
> look, but then got disappointed very quickly, so I'm a bit like a
> child who just broke his favourite toy... ;-)
Show the ANTLR principals wrong by besting them at their own game. If
you drop the sour-puss act, they'll probably wish you well, even help
you, and certainly congratulate you if you succeed.
More information about the antlr-interest