[antlr-interest] Re: More ANTLR meta-syntax questions

lgcraymer lgc at mail1.jpl.nasa.gov
Tue May 14 12:05:47 PDT 2002


> On Tuesday, May 14, 2002, at 01:13  AM, Brian Smith wrote:
> > (1) Why does ANTLR bother with seperating the lexical rules from 
the
> > parsing rules? In particular, it requires that lexical rules start 
with
> > an uppercase letter and parser rules start with a lowercase 
letter, and
> > the rules are defined in seperate sections. Is it just an enforced
> > (historical) convention or is there a philisophical reason for it? 
I

The reason is historical ANTLR implementation practice--right now, the 
grammar analysis depends on being able to distinguish token 
identifiers from rule identifiers.  That is done during parsing of 
grammar files because ANTLR uses a custom tree structure internally.  
If ANTLR were changed to use ASTs internally, it would be possible to 
maintain tables of lexer identifiers for use in the analysis phase and 
do away with the lexer/parser rule naming conventions.  I don't know 
if that is a good idea or not--the current distinction may help 
readability.

> > have found that even with my first grammer (a modified OCL 
grammar), I
> > wanted to move rules between the parser and the lexer fairly 
freely but

That is unusual--most of the time, you can just cut and paste a lexer 
grammar from one of the ANTLR examples and modify as appropriate.

> > each time I was forced to do a search and replace on the grammar 
file to
> > change letter case, even though the rule body was exactly the 
same.
> > Also, I have found that the ANTLR syntax makes the distinction 
between
> > lexer and parser seem almost arbitrary since they are both 
specified
> > with EBNF.

It does the same for tree grammars, too, and that can be confusing 
because ANTLR uses only one token of lookahead/lookdown (trees are not 
streams) when treewalking.  Also, lexer rules cannot have tree 
annotations ( ^ ! ), and parsers can only apply ^ to tokens.  EBNF 
describes the common syntax of lexers, parsers, and tree walkers; 
however each type of grammar extends on this base.  Lexer grammars 
have characters--'a'--parsers and tree walkers have tree annotations 
(^ !), and tree walkers have roots ( #( ).

--Loring Craymer





 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list