[antlr-interest] More ANTLR meta-syntax questions

Terence Parr parrt at jguru.com
Tue May 14 10:55:26 PDT 2002


On Tuesday, May 14, 2002, at 01:13  AM, Brian Smith wrote:
> (1) Why does ANTLR bother with seperating the lexical rules from the
> parsing rules? In particular, it requires that lexical rules start with
> an uppercase letter and parser rules start with a lowercase letter, and
> the rules are defined in seperate sections. Is it just an enforced
> (historical) convention or is there a philisophical reason for it? I
> have found that even with my first grammer (a modified OCL grammar), I
> wanted to move rules between the parser and the lexer fairly freely but
> each time I was forced to do a search and replace on the grammar file to
> change letter case, even though the rule body was exactly the same.
> Also, I have found that the ANTLR syntax makes the distinction between
> lexer and parser seem almost arbitrary since they are both specified
> with EBNF.

Take a look at my "building a translator by hand" PDF draft chapter of 
my book at the antlr.org site.   Look for "The Limitations of Combined 
Syntactic and Lexical Recognizers". It explains that you need a queue of 
tokens between lexer and parser (at least with k>1 top-down approach...I 
suspect that LR-based systems can do without it much of the time).  The 
difference is doing either:

1) go get me an identifier

vs

2) get a token (and then let the parser branch on what you find)

> (2) In ANTLR, you have to specify a lookahead distince (k) for the
> parser. My OCL grammar is LL(1) for all rules except one, which requires
> 2 tokens of lookahead. So, I changed my grammar to be specified as
> LL(2). Does this affect the performance of my LL(1) rules? Is there a
> runtime performance penalty to saying k=2 for rules where k=1 is

No.  ANTLR uses minimum necessary.

> sufficient? If not, is it (theoretically) possible for ANTLR to compute
> the minimum value of k needed for my grammar to be unambiguous?

Yes. this is automatic.  Slick, eh?

> (4) I know that ANTLR handles grammars that are "approximately" LL(k),
> and that LL(k) is a subset of LALR(1). But, is LL(k)+syntactic
> predicates still a subset of LALR(1)? That is, are there LALR(1),
> LALR(k), or LR(k) grammars that cannot be specified in ANTLR's
> LL(k)+syntactic predicates?

LR reduces to LL(k) if I put actions in the "right" spot. ;)  Nice 
little proof of this.  Anyway, adding syntactic predicates to any parser 
technology takes you beyond full LR(k) in strength as it is essentially 
backtracking.  There are some bizarre cases where backtracking won't 
even let you parse all unambiguous grammars, but it's pretty close.  
LL+syn preds is very powerful but still you need to avoid using them too 
freely for speed reasons.  I'm guessing the answer to your last question 
is that in theory LL+preds can handle anything LR(k) for fixed k can 
handle.

Add semantic predicates and ANTLR kicks all their butts! ;)

I got really pissed off the other day when i was updating the Java 
grammar--I wanted full LL(k) rather than my approx decisions...almost 
made me take a week off to go implement it! ;)

Ter
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list