[antlr-interest] Comparing ASTs of the two Java1.5 grammars
atripp54321
atripp at comcast.net
Mon Oct 25 12:25:45 PDT 2004
I went to update my JavaEmitter code for the new JDK1.5 grammar,
and I see we actually have two JDK1.5 grammars listed at antlr.org:
one by Michael Studman and another by Michael Stahl.
My code depends on the "shape" of the Java AST produced
by the grammar, and I'm sure eventually one of these two will
need to be chosen to be included with ANTLR as the "official" java.g.
So I tried out these two grammars on the
various new 1.5 features, and here are my notes on
the ASTs that each of these grammars produce.
For reference, here's the Sun proposed Java 1.5 grammar:
http://java.sun.com/docs/books/jls/jls-proposed-changes.html
1) Annotations
Neither grammar stores annotations in the AST.
This seems right to me, as we don't store comments in the AST either.
Anyone who's annoyed that comments are not stored in the AST
will now be even more annoyed :)
2) Generics:
Given this code:
public Vector(Collection<? extends E> c) {
Studman's produces this:
TYPE
IDENT Collection
TYPE_ARGUMENTS
TYPE_ARGUMENT
WILDCARD_TYPE
TYPE_UPPER_BOUNDS
IDENT E
And Stahl's produces this:
TYPE
IDENT Collection
TYPE_ARGS
WILDCARD
LITERAL_extends
TYPE
IDENT E
TYPE_ARGS
a) One places the TYPE subtree as a child IDENT, the other as a sibling.
I prefer Stahl's...seems strange for IDENT to have a child.
b) Studman's has the extra TYPE_ARGUMENT node, which I prefer.
c) The two trees are different under WILDCARD_TYPE. I prefer Studman's
but I'd rename "TYPE_UPPER_BOUNDS" to "TYPE_EXTENDS" (and
"TYPE_LOWER_BOUNDS"
to "TYPE_SUPER").
d) That extra TYPE_ARGS at the end of Stahl's shouldn't be there (I think)
2) For-each loop:
Given this code:
for (Integer i : integers) {
}
Studman's produces this:
LITERAL_for
FOR_EACH_CLAUSE
PARAMETER_DEF
MODIFIERS
TYPE
IDENT Integer
IDENT i
EXPR
IDENT integers
SLIST
And Stahl's produces this:
LITERAL_for
PARAMETER_DEF
MODIFIERS
TYPE
IDENT Integer
TYPE_ARGS
IDENT i
EXPR
IDENT integers
SLIST
I prefer Studman's with the "FOR_EACH_CLAUSE" node which parallels the
"FOR_INIT",
"FOR_CONDITION", and "FOR_ITERATOR" nodes in the old "for" syntax.
3) Enums:
Given this code:
enum Rank2 implements whatever {ONE, TWO, THREE}
Studman's produces this:
ENUM_DEF
MODIFIERS
IDENT Rank2
IMPLEMENTS_CLAUSE
IDENT whatever
OBJBLOCK
ENUM_CONSTANT_DEF
ANNOTATIONS
IDENT ONE
ENUM_CONSTANT_DEF
ANNOTATIONS
IDENT TWO
ENUM_CONSTANT_DEF
ANNOTATIONS
IDENT THREE
Stahl's failed with "unexpected token" exception.
Given a full enum definitions, Studman's produced an AST that's identical
to a class definition, but with ENUM_DEF in place of CLASS_DEF.
Stahl's failed on this one too.
4) Varargs:
Given this code:
void test(int i, String... strings)
Studman's produces this:
PARAMETERS
PARAMETER_DEF
MODIFIERS
TYPE
LITERAL_int
IDENT i
VARIABLE_PARAMETER_DEF
MODIFIERS
TYPE
IDENT String
IDENT strings
And Stahl's produces this:
PARAMETERS
PARAMETER_DEF
MODIFIERS
TYPE
LITERAL_int
IDENT i
PARAMETER_DEF
MODIFIERS
TYPE
IDENT String
TYPE_ARGS
ELLIPSIS
IDENT strings
I prefer Studman's AST with the explicit VARIABLE_PARAMETER_DEF node.
5) Static imports:
Given this code:
import static java.lang.Math.PI;
Studman's produces this:
STATIC_IMPORT
DOT
DOT
DOT
IDENT java
IDENT lang
IDENT Math
IDENT PI
And Stahl's produces this:
IMPORT
LITERAL_static
DOT
DOT
DOT
IDENT java
IDENT lang
IDENT Math
IDENT PI
I prefer Studman's STATIC_IMPORT. The issue here is whether a "static
import"
is just an "import" that happens to have a "static" modifier
(as when a variable is static),
or whether it's a new type of thing (in the way that a "static block"
differs
from a regular block).
Summary:
Given that these two both correctly parse Java 1.5 code (which they seem
to except for the enum problem noted above), choosing one of these to
be the "official" java.g comes down to which produces a "better" AST.
I've listed the differences and it looks to me like Studman's AST's
look like they're more consistent with the ASTs we get today.
And of course, some guru should look closely at the grammar to make
sure that it matches the "official" grammar in the JLS, and comments as
needed, make sure token names are consistent, etc.
Andy
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list