[antlr-interest] Why ANTLR doesn't check existence of lexical symbols?

Wed Jun 11 14:36:12 PDT 2003

1.  ANTLR doesn't really know the difference between AST node types and
Token types, because they are just integers that either get passed from a
Token to an AST upon AST creation or set in an action that is building an
AST node.  Antlr would have to analyze the lexer and the parser action code
to figure out what was going on.  Then it would have to hope that there are
no TokenStreamFilters in between that are modifying things.  And there is a
definite limit to how smart ANTLR can be in analyzing actions because it's
not going to penetrate into method calls that return node types, for
instance.

2.  See the tests that come with the gcc grammar.  I've got like 88 of them
but most are just a few lines, the simplest C program that uses a feature.
If I add a wacky rule I definitely want a test there to prove it is working.
And GCC has some real wackiness!  A side benefit is that I can run all the
other tests to see if I broke something else.

3.  Unit tests are supposed to be as close to trivial as possible--the
simplest input that excersizes a feature.  I would run the gcc grammar on
all the gnu sources and linux kernel and stuff.  If I found an error I would
narrow it down to as simple a file as would reproduce the problem.  Then I'd
fix the rule and see the test pass.  

4.  I agree the tool could be friendlier, but this problem at least is
pretty easy to solve, no?  I mean it is pretty obvious when the wrong token
is generated, you get an "unexpected token" error.  Then you us -traceLexer
to see what's up and find out it's not going to the right rule or that the
rule doesn't exist in the lexer.  Figuring out ambiguities, on the other
hand, now that's a hard problem and would be worth spending the time to make
the tool friendlier for that.  And figuring out linear approximation
weirdness.  However, I digress.

5.  Sure, everyone makes mistakes.  The real question is how long it takes
to recover from the mistake?  People learn from their mistakes too.  Like
when I misplaced a semicolon and didn't figure it out for 4 hours (antlr
2.2.2 didn't warn about these things).  That's when I automated checking in
with every run of antlr.Tool.  Then I could get back to where I was 15
minutes ago and get on with my life.

Monty

-----Original Message-----
From: Greg Lindholm [mailto:glindholm at yahoo.com] 
Sent: Wednesday, June 11, 2003 2:12 PM
To: antlr-interest at yahoogroups.com
Subject: RE: [antlr-interest] Why ANTLR doesn't check existence of lexical
symbols?

--- mzukowski at yci.com wrote:
> Parsers use symbols for AST node types as well.  Just because the 
> parser knows about a symbol doesn't mean the lexer has to generate it.
> 
I don't realize these were used for AST node types. 
But, doesn't Antlr know the difference between an AST node type and
something that it's suppose to match from a token stream?

> I'm having a hard time understanding how someone can mentally add a 
> new token to a parser rule but not put that token into a lexer.
> 
I don't make coding mistakes on purpose. It's more a problem of typos,
forgetting and just plain mistakes.  The point is, a friendly tools should
warn you of any obvious errors that it is able to detect.

> At the very least these are trivially caught by your unit tests which 
> excersize your grammar rules.
> 
IMHO I don't believe the words "trivial", "unit tests" and "grammar"
belong in the same sentence :)   Unless you mean "trival unit tests" or
"trivial grammar" :)  

Greg

> Monty
> 
> -----Original Message-----
> From: Greg Lindholm [mailto:glindholm at yahoo.com]
> Sent: Wednesday, June 11, 2003 1:25 PM
> To: antlr-interest at yahoogroups.com
> Subject: Re: [antlr-interest] Why ANTLR doesn't check existence of
> lexical
> symbols?
> 
> 
> Hi Ter,
> 
> It seems to me that there are 2 case;
> 1) the lexer is in the same file or
> 2) the parser does an import of the token symbols
> 
> In either case if Antlr encounters undefined tokens in a parser and 
> has to generate new symbols I think you got a problem that at the very 
> least deserves a warning message.
> 
> Cause if the lexer doesn't know about the symbol it's never going to 
> create a token of that type.
> 
> Or am I missing something?
> 
> 
> --- Terence Parr <parrt at jguru.com> wrote:
> > 
> > On Wednesday, June 11, 2003, at 02:11  AM, Hrvoje Nezic wrote:
> > 
> > > Hi,
> > >
> > > If some lexical symbol is referenced in parser grammar, but is not 
> > > actually defined in lexer, ANTLR doesn't generate
> error
> > > or warning messages, so this can be detected only at runtime on
> > > parser testing. I find it very inconvenient, because you have to 
> > > check existence of token symbols manually.
> > > Is there any reason why ANTLR behaves like this, and
> > > is there any workaround?
> > 
> > The problem is that ANTLR can be hooked up to any TokenStream
> object.
> >  
> > Further, the lexer may not be defined in the same grammar file. 
> > ANTLR cannot answer this question, I guess is the answer (though not 
> > the one
> > you are hoping for) ;)
> > 
> > Terence
> > --
> > Co-founder, http://www.jguru.com
> > Creator, ANTLR Parser Generator: http://www.antlr.org Co-founder,
> > http://www.peerscope.com link sharing, pure-n-simple Lecturer in
> Comp.
> > Sci., University of San Francisco
> > 
> > 
> >  
> > 
> > Your use of Yahoo! Groups is subject to
> > http://docs.yahoo.com/info/terms/
> > 
> > 
> 
> 
> __________________________________
> Do you Yahoo!?
> Yahoo! Calendar - Free online calendar with sync to Outlook(TM). 
> http://calendar.yahoo.com
> 
>  
> 
> Your use of Yahoo! Groups is subject to 
> http://docs.yahoo.com/info/terms/
> 
> 
>  
> 
> Your use of Yahoo! Groups is subject to 
> http://docs.yahoo.com/info/terms/
> 
> 

__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/