[antlr-interest] Lexer bug?

Mon Oct 22 23:46:19 PDT 2007

All the comments are well met by me, and I delight in the fact that we have
re-established, if indeed it was ever lost, a level of civility that is
becoming of a difficult and absorbing topic. 

As Loring stated, the important thing is to distill from such discussion,
those serious issues that (though he did not state this explicitly) are
founded on sound theory, or developing theory, recognize where the current
system departs from this, and think about how to deal with it. It is my only
wish to divide this from what CAN be done now. Separate threads can then
deal with the sound (or even unsound) points about the analysis of
constructs as well as how close to LL(*) they are or might be. If one finds
a working path through discussion here, then this is a good thing indeed.

My own leaning, while I understand completely the theoretical basis of all
the points made thus far, is only that we should enable the realization of
lexers, parsers and tree walkers with what is the actuality of the moment,
while endeavoring to further the actualization of theories both long
established and being explored, while maintaining an attitude that promotes
both exploration and correction of paths new and old. (Apologies for all the
commas in that statement Mr. Wittgenstein).

I thank everyone for their courtesy and well made points - we will all
benefit from such discourse. I have felt for some time that a number of wiki
articles are needed that concern the practical course of writing grammars -
if you like, the cookbook that complements Ter's book. Perhaps this is the
moment to write such things and perhaps I will write some. It is a concern
that such an article may draw comments and critique outside the domain that
it is meant for, but what the hey! 

There are few of us that are writing full blown compilers these days [slight
paraphrase from something said by Terence], but many people would like to
know how to knock up a parser for something slightly more complicated than
x=y newline a=b etc. For such needs, a guide to just getting the thing done
efficiently is probably more useful than discussions of just what the
difference between LALR, LR, LL, LK, NBA and LXMAKEITUPHERE is. The interest
and validity in theory is obvious, but for many the question is "So what do
I do to make this work?" is probably more poignant :-)

Jim   

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Kirby Bohling
> Sent: Monday, October 22, 2007 7:01 PM
> To: Clifford Heath
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Lexer bug?
> 
> On 10/22/07, Clifford Heath <clifford.heath at gmail.com> wrote:
> > Jim Idle wrote:
> > >  > Jim Idle wrote:
> > >  > > This isn't a bug.
> > >  > Nonsense. Any lexer that consumes characters that aren't a legal
> token,
> > >  > and announces a legal token without flagging an error, has a
> bug.
> > > It wasn't my intention to offend and elicit an emphatic "nonsense"
> > > response. However I should point out that the accusation is of
> course
> > > erroneous. The lexer produces code that is in line with the
> original
> > > design.
> >
> > First up, let me say that I'm sorry my post was thought uncivil. I do
> > appreciate the helpful discussion and workarounds offered, and I
> don't
> > mean to disparage anyone.
> >
> > However, I still maintain that the job of a lexer is to divide the
> input
> > into tokens, without discarding any. If it's unable to do that, it
> must
> > report an error. If not, then the tokens must be correctly matched.
> There
> > is no middle path, and any design that allows one is faulty, even if
> the
> > code implements the design perfectly. Such principles are black-and-
> white,
> > and that's why I used the word "nonsense".
> >
> 
> I could see how a person could perceive your statement of "black and
> white" as being too strongly worded.  You can have strong opinions,
> but that's stated as an absolute fact. I think the design of Antlr
> worked fairly hard to not follow a principle you consider absolute.  I
> happen to agree with you, but that's besides the point.
> 
> I think you just disagreed with a fundamental decision of Antlr3
> (Antlr2 might also do it, but I don't know)... I mean Antlr3 works
> fairly hard recover by skipping a single token and proceed on.
> 
> During interactive behavior, it seems like it'd be really nice, but
> during a batch run (like compiling source), I really dislike it.  It'd
> be nice if I had some opportunity to programatically control that.
> I'd also say that fundamentally, I'd really like it if Antlr did some
> of these:
> 
> 1. Warned me at generation time that my grammar has an LL(1) case
> where the error recovery might do something counter-intuitive.  So at
> least, I'd know something was off prior to discovering during testing
> my grammar.
> 
> 2. Gave me programmatic ability to disable the LL(1) recovery at
> generation and/or run time preferrable run time (or the ability to
> generate two different parsers for the same grammer, one with error
> recovery, the other out).
> 
> 3. It never used LL(1) recovery until it had exhaustively searched for
> other solutions.
> 
> If I could figure out how to get Antlr building, I'd try and help.
> Alas the Ant scripts are failing me, and I haven't had time to fix it
> (I think it's mostly that Antlr 2.7 isn't installed correctly on for
> Ant to pick it up).
> 
> Fundamentally, the automatic recovery feels like it can cause some of
> the same problems that HTML and Web Browsers did forever.  Given some
> input that is really close to what I want, but is slightly wrong,
> leads to very strange behavior because some tool is guessing what I
> meant instead of saying "I'm sorry Dave, I'm afraid I can't do that.".
>  I'd really like a way to put Antlr into a very, very strict mode.
> Hacking around in the exception handling of both the parser and the
> lexer is just inelegant.
> 
> Alas, I get nothing but silence so far.  Hopefully folks don't find my
> e-mails too annoying.
> 
> Thanks,
>     Kirby
> 
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.488 / Virus Database: 269.15.5/1084 - Release Date:
> 10/21/2007 3:09 PM
> 

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.488 / Virus Database: 269.15.5/1084 - Release Date: 10/21/2007
3:09 PM

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20071022/124aab4f/attachment-0001.html