[antlr-interest] Re: More ANTLR meta-syntax questions
lgcraymer
lgc at mail1.jpl.nasa.gov
Tue May 14 12:05:47 PDT 2002
> On Tuesday, May 14, 2002, at 01:13 AM, Brian Smith wrote:
> > (1) Why does ANTLR bother with seperating the lexical rules from
the
> > parsing rules? In particular, it requires that lexical rules start
with
> > an uppercase letter and parser rules start with a lowercase
letter, and
> > the rules are defined in seperate sections. Is it just an enforced
> > (historical) convention or is there a philisophical reason for it?
I
The reason is historical ANTLR implementation practice--right now, the
grammar analysis depends on being able to distinguish token
identifiers from rule identifiers. That is done during parsing of
grammar files because ANTLR uses a custom tree structure internally.
If ANTLR were changed to use ASTs internally, it would be possible to
maintain tables of lexer identifiers for use in the analysis phase and
do away with the lexer/parser rule naming conventions. I don't know
if that is a good idea or not--the current distinction may help
readability.
> > have found that even with my first grammer (a modified OCL
grammar), I
> > wanted to move rules between the parser and the lexer fairly
freely but
That is unusual--most of the time, you can just cut and paste a lexer
grammar from one of the ANTLR examples and modify as appropriate.
> > each time I was forced to do a search and replace on the grammar
file to
> > change letter case, even though the rule body was exactly the
same.
> > Also, I have found that the ANTLR syntax makes the distinction
between
> > lexer and parser seem almost arbitrary since they are both
specified
> > with EBNF.
It does the same for tree grammars, too, and that can be confusing
because ANTLR uses only one token of lookahead/lookdown (trees are not
streams) when treewalking. Also, lexer rules cannot have tree
annotations ( ^ ! ), and parsers can only apply ^ to tokens. EBNF
describes the common syntax of lexers, parsers, and tree walkers;
however each type of grammar extends on this base. Lexer grammars
have characters--'a'--parsers and tree walkers have tree annotations
(^ !), and tree walkers have roots ( #( ).
--Loring Craymer
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list