[antlr-interest] Comparing ASTs of the two Java1.5 grammars

atripp54321 atripp at comcast.net
Mon Oct 25 12:25:45 PDT 2004



I went to update my JavaEmitter code for the new JDK1.5 grammar,
and I see we actually have two JDK1.5 grammars listed at antlr.org:
one by Michael Studman and another by Michael Stahl.
My code depends on the "shape" of the Java AST produced
by the grammar, and I'm sure eventually one of these two will
need to be chosen to be included with ANTLR as the "official" java.g.

So I tried out these two grammars on the 
various new 1.5 features, and here are my notes on
the ASTs that each of these grammars produce.
For reference, here's the Sun proposed Java 1.5 grammar:
http://java.sun.com/docs/books/jls/jls-proposed-changes.html

1) Annotations
Neither grammar stores annotations in the AST.
This seems right to me, as we don't store comments in the AST either.
Anyone who's annoyed that comments are not stored in the AST
will now be even more annoyed :)

2) Generics:
Given this code:
    public Vector(Collection<? extends E> c) {

Studman's produces this:
            TYPE
              IDENT Collection
                TYPE_ARGUMENTS
                  TYPE_ARGUMENT
                    WILDCARD_TYPE
                      TYPE_UPPER_BOUNDS
                        IDENT E
 
And Stahl's produces this:
           TYPE
             IDENT Collection
             TYPE_ARGS
               WILDCARD
                 LITERAL_extends
                 TYPE
                   IDENT E
                   TYPE_ARGS	

a) One places the TYPE subtree as a child IDENT, the other as a sibling.
I prefer Stahl's...seems strange for IDENT to have a child.
b) Studman's has the extra TYPE_ARGUMENT node, which I prefer.
c) The two trees are different under WILDCARD_TYPE. I prefer Studman's
but I'd rename "TYPE_UPPER_BOUNDS" to "TYPE_EXTENDS" (and
"TYPE_LOWER_BOUNDS"
to "TYPE_SUPER").
d) That extra TYPE_ARGS at the end of Stahl's shouldn't be there (I think)

2) For-each loop:
Given this code:
                for (Integer i : integers) {
                }

Studman's produces this:
         LITERAL_for
           FOR_EACH_CLAUSE
             PARAMETER_DEF
               MODIFIERS
               TYPE
                 IDENT Integer
               IDENT i
             EXPR
               IDENT integers
           SLIST

And Stahl's produces this:
          LITERAL_for
            PARAMETER_DEF
              MODIFIERS
              TYPE
                IDENT Integer
                TYPE_ARGS
              IDENT i
            EXPR
              IDENT integers
            SLIST

I prefer Studman's with the "FOR_EACH_CLAUSE" node which parallels the
"FOR_INIT",
"FOR_CONDITION", and "FOR_ITERATOR" nodes in the old "for" syntax.

3) Enums:
Given this code:
   enum Rank2 implements whatever {ONE, TWO, THREE}
Studman's produces this:
      ENUM_DEF
        MODIFIERS
        IDENT Rank2
        IMPLEMENTS_CLAUSE
          IDENT whatever
        OBJBLOCK
          ENUM_CONSTANT_DEF
            ANNOTATIONS
            IDENT ONE
          ENUM_CONSTANT_DEF
            ANNOTATIONS
            IDENT TWO
          ENUM_CONSTANT_DEF
            ANNOTATIONS
            IDENT THREE

Stahl's failed with "unexpected token" exception.

Given a full enum definitions, Studman's produced an AST that's identical
to a class definition, but with ENUM_DEF in place of CLASS_DEF.
Stahl's failed on this one too.

4) Varargs:
Given this code:
	void test(int i, String... strings)

Studman's produces this:
        PARAMETERS
          PARAMETER_DEF
            MODIFIERS
            TYPE
              LITERAL_int
            IDENT i
          VARIABLE_PARAMETER_DEF
            MODIFIERS
            TYPE
              IDENT String
            IDENT strings

And Stahl's produces this:
        PARAMETERS
          PARAMETER_DEF
            MODIFIERS
            TYPE
              LITERAL_int
            IDENT i
          PARAMETER_DEF
            MODIFIERS
            TYPE
              IDENT String
              TYPE_ARGS
              ELLIPSIS
            IDENT strings

I prefer Studman's AST with the explicit VARIABLE_PARAMETER_DEF node.

5) Static imports:
Given this code:
import static java.lang.Math.PI;

Studman's produces this:
  STATIC_IMPORT
    DOT
      DOT
        DOT
          IDENT java
          IDENT lang
        IDENT Math
      IDENT PI

And Stahl's produces this:
  IMPORT
    LITERAL_static
    DOT
      DOT
        DOT
          IDENT java
          IDENT lang
        IDENT Math
      IDENT PI

I prefer Studman's STATIC_IMPORT. The issue here is whether a "static
import"
is just an "import" that happens to have a "static" modifier 
(as when a variable is static),
or whether it's a new type of thing (in the way that a "static block"
differs
from a regular block).

Summary:
Given that these two both correctly parse Java 1.5 code (which they seem
to except for the enum problem noted above), choosing one of these to
be the "official" java.g comes down to which produces a "better" AST.
I've listed the differences and it looks to me like Studman's AST's
look like they're more consistent with the ASTs we get today.

And of course, some guru should look closely at the grammar to make
sure that it matches the "official" grammar in the JLS, and comments as
needed, make sure token names are consistent, etc.

Andy





 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list