[antlr-interest] suggestions for 2.7.2

Thu Jan 3 00:10:28 PST 2002

Hi Ter,

in a message on jguru.com you asked for suggestions for 2.7.2.
Well .... here are my suggestions (hope you don't mind I use
yahoo instead of jguru; I prefer email).

* With antlr you specify a lookahead for the complete grammar;
it is not possible to specify a lookahead per (sub)rule. I believe
javacc does support this, and I think it would be great if antlr
would allow different lookaheads on (sub)rule basis.
(That would eliminate many of the syntactic predicates I have
to use now).

   e.g.   myrule {options k=4;} :  A B C D | A B C E ;

* it used to be possible to have multiple lexers and parsers in
one grammar file. But that has been disabled (for whatever reason).
It would be great if this was enabled again, cause now i have to
put lexers in multiple grammar files (antlr doesn't support states, so
I need multiple lexers).

* as said in previous point, the lexer in antlr doesn't support
lexical states; the alternative is to use multiple lexers.
However, now I have to duplicate several lexer rules accross
different lexers; that's not a good thing!
Now, if it were possible to specify semantic predicates with
the "highest" lexer rules (the ones that match a complete token),
then I could mimic lexer states using semantic predicates!
That would be a great improvement, e.g.

    TOKEN_A  :  (mystate == 1)? "a" ;
    TOKEN_A2 : (mystate == 2)? "a" ;

* another problem with lexers in antlr is the fact that I have to
combine conflicting rules into a common rule and use setType
to return the correct token.
Couldn't it be possible to specify syntactic predicates with the
"highest" lexer rules (the ones that match a complete token).
The first lexer rule would be tried first, and then the next one, etc.
(similar to multiple sub-rules with syntactic predicates).
That way I could keep my lexer a lot more readable/maintainable.

e.g.
    TOKEN_FLOAT: (INT '.')=> INT '.' INT ;
    TOKEN_INTEGER:  INT ;

* with grammar inheritance it is possible to redefine a rule, but
as far as I know it is not possible to extend a rule without having
to repeat it's contents.
e.g    myRule : super.myRule | someExtraRule ;

* when using grammar inheritance I have to specify all base
grammars on the command line. It would be nice if I could
specificy a list of search directories instead. (easier to manage)

* the java code of the antlr parser generator is mixed with the
java code for the antlr runtime. It would be GREAT if you
could move the runtime classes to a separate package so there
is a clear distinction between runtime antlr classes and parser
generator classes.

* although resolving conflicts in antlr is a lot easier than in yacc,
it can still be difficult to understand why a conflict occurs.
It would be great if antlr would report the two different derivations
that conflict (and not only the rule where the conflict occurs).

* it would be nice if antlr would generate a _nextToken() function
instead of nextToken(). The default implementation of nextToken() would
just call _nextToken(), but I would have the possibility of redefining
the nextToken() function.

* I use {options greedy=true} a lot; can't you introduce a special
symbol for this construct? e.g. [a]?  or  [a]* where the brackets
indicate greedy (or something like that)

* why do I have to make all keywords lower case when specifying
that a parser is case-insensitive? For certain languages it is more
intuitive to put upper-case keywords in the grammar (e.g. COBOL).
It would be nice if case-insensitive would also apply to the keywords
I put in the grammar myself.

* i have several parsers, and each parser defines some common
support methods/functions. Currently these functions are duplicated
between the parsers. I tried to put them in a common class
and derive my parsers from this common class, but antlr insists
that i defined this common class in a .g file; it can't be a plain java
class (that extends class LLkParser).
It would be nice if I could derive my parsers from my own
parser class; not only from an antlr generated parser class.

* a nice feature I found in a different tool was the concept of defining
parser rules for "hidden" tokens. This can be used to parse
preprocessor statements, like (#include or COPY in cobol) by
specifying special grammar rules in the parser.
The concept is simple: hidden tokens are ignored unless there
are "special" rules that matche them.
(I'm not sure how this could be integrated in antlr, but I liked the
concept).

* finally, I understand that the way you combine lookahead sets
can cause incorrect parsers (I experienced that just a week ago ...).
It would be nice if I could overrule the "merging of lookahead sets"
for specific rules.

Kind greetings, Stdiobe

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/