[antlr-interest] Why don't parsers support character ranges?

Thu Apr 24 11:36:30 PDT 2008

----- Original Message ----
> From: Hannes Schmidt <antlr5 at hannesschmidt.net>
> To: Magnus Danielson <magnus at rubidium.dyndns.org>
> Cc: antlr-interest at antlr.org
> Sent: Thursday, April 24, 2008 8:56:38 AM
> Subject: Re: [antlr-interest] Why don't parsers support character ranges?
> 
....
> > I have to disagree with you. As I have addressed this particular problem when
> > using PCCTS (don't ask and I won't tell why) I found that the best method to
> > handle these types of problems was to divide the lexer grammar into groups
> > (this isn't the correct term, but it is easy enought to dig it up).
> > By changing the active lexer group as I progress throught the grammar, I also
> > changes the set of token definitions which can be generated and any inclusion
> > problems as you indicate will be eliminated.
> >
> >  
> Agreed, this is a nifty solution and a viable alternative to what 
> Johannes and Troy have suggested (manually making tokens disjunctive). 
> It's just that this is too much overhead for 80% of all grammars and 80% 
> of all ANTLR users. Eliminate the lexer concept and all of this goes 
> away! For the performance-conscious there should be an option to  front  
> the  parser with a token-generating  lexer, but that should be optional, 
> not the default. I'm guessing that the majority of users would rather 
> start with an intuitive grammar and see if it needs optimizing. The 
> forced lexer/parser separation is a typical case of premature 
> optimization. Don't get me wrong, I love ANTLR, it is far better than 
> anything I have used before, especially because of it's power and 
> flexibility. But that doesn't mean we can't improve it, right?

This is an implementation artifact; unifying parser and lexer cannot fix this.  Ter does not currently use FOLLOW sets in lexers (64K bits is a bit large for a bit set, especially when there might be 100 of them per lexer) but does in parsers.  As a result, lexers are currently less capable than  parsers.  This is fixable,  but it requires more analysis of lexer rules and character sets, and it makes sense to hold off on the additional analysis until after ANTLR 3 is implemented in ANTLR 3.

Once Ter fixes this particular problem, you will have your dream:  you will be able to do everything you wish to in the lexer without ever having to feed it to a parser.  Of course, that assumes that you can solve your problem with minimal additional analysis.  That is not a very good assumption, but experience should teach you that.

If you really want to do this now, I suggest that you use ANTLR 2.  All of the lexing capabilities were present there (excepting hoisting predicates and the need to specify maximum lookahead).  Oddly enough, these capabilities have rarely been exploited:  people learned to partition their problems across multiple translation passes.

--Loring

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ