[antlr-interest] Why don't parsers support character ranges?

Wed Apr 23 20:03:08 PDT 2008

You may have already stated what it is you were trying to do, but scanning threads isn't making this obvious to me, so perhaps a paragraph on the requirement might help a bit here? It is the case that ANTLR isn't the best tool for all parsing jobs of course, for instance HTML isn't easy and XML is very awkward because it was designed to be 'simple to parse' - as in you can do it with a bit of hand crafted stuff that will be very fast and probably isn't difficult to maintain because it isn't that complicated a language - I dealt with it in the lexer for a VB.Net parser for instance. 

I am sure you can get a bit more help for your parsing, but if you already came up with something else that works, then that might be just fine for you. Mostly it is the fragility of hand-crafted parsers that makes generated recognizers so attractive.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Peter Nann
> Sent: Wednesday, April 23, 2008 6:07 PM
> To: Randall R Schulz; antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Why don't parsers support character
> ranges?
> 
> 
> Randall R Schulz wrote:
> > If you got that stuff in CS101,
> 
> OK, Maybe it was CS201... But you know what I mean.
> 
> > Show the ANTLR principals wrong by besting them at their own game. If
> > you drop the sour-puss act, they'll probably wish you well, even help
> > you, and certainly congratulate you if you succeed.
> 
> I think the problem was that my task was quite simple.
> I think ANTLR makes hard things easier (has many cool features for
> that), but in my case my simple task didn't turn out to be a simple
> solution in ANTLR.
> 
> No tool is perfect, and (almost?) to tool can maintain a linear
> relationship between problem complexity and solution complexity, I get
> that.
> 
> So, let's just put it down to bad luck for my specific requirements.
> ANTLR does look like an awesome tool for a very broad range of more
> complex problems, and I'll leave it at that!
> 
> 
> Thanks for your well put response(s).
> 
> 
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Randall R Schulz
> Sent: Thursday, 24 April 2008 10:56 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Why don't parsers support character
> ranges?
> 
> On Wednesday 23 April 2008 17:01, Peter Nann wrote:
> > Hmmm, I was hoping for more than the 'efficiency' argument...
> > I am wondering if that argument is about 10 years past its use-by
> > date...
> > We are not in the days of single-digit-Megahertz and RAM measured in
> k
> 
> > anymore... when lexx and yacc were written...
> 
> Well, ANTLR goes well beyond lex and yacc. However, if you believe that
> lexer / parser stratification is no longer justified, you could set out
> to prove that thesis by writing a unified lexer / parser generator
> tool.
> (That does everything current tools do!) Many good current parser
> generators are open source (including ANTLR, of course), so you can
> exploit the techniques they use and that you like and replace or
> improve
> the ones you don't.
> 
> Personally, I'm not sure stratifying the lexical and syntactic analysis
> is a bad thing. I've certainly never found it to be a problem, and I've
> written my share of parsers, using lex & yacc (or flex and bison, I
> guess), JavaCC, ANTLR 2.x and 3.x. The only thing I don't care for is
> the use of alphabetic case to distinguish lexical from syntactical
> rules.
> 
> 
> > It would depend on the scale of parsing you need to do of course, but
> > for small-scale parsing I would question whether CPU and RAM matters
> > any more on that task...
> 
> You know, there's a reason we don't call them "little languages" any
> more. They are never little and they never were little! And while it's
> legitimate to make a considered choice about trading off, say,
> developer
> time and execution time, it's not really OK to do something slowly when
> you don't get something in turn for it.
> 
> 
> > I will have to take your word about 'combinatorial explosion' for
> some
> 
> > problems, but I thought simple RDP's could pretty much break down to
> > one branch (as in: switch statement) per character (or token if you
> > tokenize it), which doesn't seem excessive, or combinatorial.
> 
> You may still want to produce a DFA, and that can in general yield and
> exponential increase in the number of states. Not stratifying the
> lexical and syntactic layers will exacerbate that problem (I think).
> 
> And I don't have any idea about the consequences of unifying lexical
> analysis with syntax analysis in the face of arbitrary or variable
> look-ahead or backtracking.
> 
> Lastly, I still think lexical states (as they exist in JavaCC, e.g.)
> would be a good thing. It seems that would be harder to do when the
> lexer is not separated from the parser.
> 
> 
> >  - But, yes, that was just my CS101 project!
> 
> Interesting. If you got that stuff in CS101, you must have gotten one
> hell of a CS education.
> 
> 
> > ...
> >
> > Sorry to be a sour-puss, but I was quite excited about ANTLR at first
> > look, but then got disappointed very quickly, so I'm a bit like a
> > child who just broke his favourite toy...  ;-)
> 
> Show the ANTLR principals wrong by besting them at their own game. If
> you drop the sour-puss act, they'll probably wish you well, even help
> you, and certainly congratulate you if you succeed.
> 
> 
> Randall Schulz