[antlr-interest] Re: Comparing ASTs of the two Java1.5 grammars

Michael Stahl gcpa-antlr-interest at m.gmane.org
Tue Nov 9 12:37:38 PST 2004


On Mon, 25 Oct 2004 20:25:45 +0000, atripp54321 wrote:
> I went to update my JavaEmitter code for the new JDK1.5 grammar,
> and I see we actually have two JDK1.5 grammars listed at antlr.org:
> one by Michael Studman and another by Michael Stahl.
> My code depends on the "shape" of the Java AST produced
> by the grammar, and I'm sure eventually one of these two will
> need to be chosen to be included with ANTLR as the "official" java.g.
> 
> So I tried out these two grammars on the 
> various new 1.5 features, and here are my notes on
> the ASTs that each of these grammars produce.
> For reference, here's the Sun proposed Java 1.5 grammar:
> http://java.sun.com/docs/books/jls/jls-proposed-changes.html
> 
> 1) Annotations
> Neither grammar stores annotations in the AST.
> This seems right to me, as we don't store comments in the AST either.
> Anyone who's annoyed that comments are not stored in the AST
> will now be even more annoyed :)

hm, i believe my grammar should not throw annotations away ever.
that would be a bug.
do you have a testcase?
 
> 2) Generics:
> Given this code:
>     public Vector(Collection<? extends E> c) {
> 
> Studman's produces this:
>             TYPE
>               IDENT Collection
>                 TYPE_ARGUMENTS
>                   TYPE_ARGUMENT
>                     WILDCARD_TYPE
>                       TYPE_UPPER_BOUNDS
>                         IDENT E
>  
> And Stahl's produces this:
>            TYPE
>              IDENT Collection
>              TYPE_ARGS
>                WILDCARD
>                  LITERAL_extends
>                  TYPE
>                    IDENT E
>                    TYPE_ARGS	
> 
> a) One places the TYPE subtree as a child IDENT, the other as a sibling.
> I prefer Stahl's...seems strange for IDENT to have a child.
> b) Studman's has the extra TYPE_ARGUMENT node, which I prefer.

mine would have a TYPE if the argument were not a WILDCARD,
maybe i should have called it WILDCARD_ARG...
i would say that the extra TYPE_ARGUMENT is superfluous in this case,
since you can only have exactly one TYPE or exactly one WILDCARD
within it anyway.

> c) The two trees are different under WILDCARD_TYPE. I prefer Studman's
> but I'd rename "TYPE_UPPER_BOUNDS" to "TYPE_EXTENDS" (and
> "TYPE_LOWER_BOUNDS"
> to "TYPE_SUPER").
> d) That extra TYPE_ARGS at the end of Stahl's shouldn't be there (I think)

that's not a bug, that's a feature :)
my TYPE nodes always come with a TYPE_ARGS nested within, even if
there aren't any type args. i thought it makes more sense this way,
it is similar to e.g. MODIFIERS.

> 2) For-each loop:
> Given this code:
>                 for (Integer i : integers) {
>                 }
> 
> Studman's produces this:
>          LITERAL_for
>            FOR_EACH_CLAUSE
>              PARAMETER_DEF
>                MODIFIERS
>                TYPE
>                  IDENT Integer
>                IDENT i
>              EXPR
>                IDENT integers
>            SLIST
> 
> And Stahl's produces this:
>           LITERAL_for
>             PARAMETER_DEF
>               MODIFIERS
>               TYPE
>                 IDENT Integer
>                 TYPE_ARGS
>               IDENT i
>             EXPR
>               IDENT integers
>             SLIST
> 
> I prefer Studman's with the "FOR_EACH_CLAUSE" node which parallels the
> "FOR_INIT",
> "FOR_CONDITION", and "FOR_ITERATOR" nodes in the old "for" syntax.

oh, i just noticed that i have forgotten this.
my whitespace-preserving parser puts a ENHANCED_FOR there, right
where the FOR_EACH_CLAUSE goes, but the one i put up on antlr.org
does not.

> 3) Enums:
> Given this code:
>    enum Rank2 implements whatever {ONE, TWO, THREE}
> Studman's produces this:
>       ENUM_DEF
>         MODIFIERS
>         IDENT Rank2
>         IMPLEMENTS_CLAUSE
>           IDENT whatever
>         OBJBLOCK
>           ENUM_CONSTANT_DEF
>             ANNOTATIONS
>             IDENT ONE
>           ENUM_CONSTANT_DEF
>             ANNOTATIONS
>             IDENT TWO
>           ENUM_CONSTANT_DEF
>             ANNOTATIONS
>             IDENT THREE
> 
> Stahl's failed with "unexpected token" exception.
> 
> Given a full enum definitions, Studman's produced an AST that's identical
> to a class definition, but with ENUM_DEF in place of CLASS_DEF.
> Stahl's failed on this one too.

oh, that would be because you forgot to turn on the enum keyword
in the lexer. it is off by default, as it is not backwards compatible
with java 1.4 code. just call the enableEnum() method of the lexer
and try again.

> 4) Varargs:
> Given this code:
> 	void test(int i, String... strings)
> 
> Studman's produces this:
>         PARAMETERS
>           PARAMETER_DEF
>             MODIFIERS
>             TYPE
>               LITERAL_int
>             IDENT i
>           VARIABLE_PARAMETER_DEF
>             MODIFIERS
>             TYPE
>               IDENT String
>             IDENT strings
> 
> And Stahl's produces this:
>         PARAMETERS
>           PARAMETER_DEF
>             MODIFIERS
>             TYPE
>               LITERAL_int
>             IDENT i
>           PARAMETER_DEF
>             MODIFIERS
>             TYPE
>               IDENT String
>               TYPE_ARGS
>               ELLIPSIS
>             IDENT strings
> 
> I prefer Studman's AST with the explicit VARIABLE_PARAMETER_DEF node.

hm... i think my ELLIPSIS node there sucks, no idea why i put it
there :)

> 5) Static imports:
> Given this code:
> import static java.lang.Math.PI;
> 
> Studman's produces this:
>   STATIC_IMPORT
>     DOT
>       DOT
>         DOT
>           IDENT java
>           IDENT lang
>         IDENT Math
>       IDENT PI
> 
> And Stahl's produces this:
>   IMPORT
>     LITERAL_static
>     DOT
>       DOT
>         DOT
>           IDENT java
>           IDENT lang
>         IDENT Math
>       IDENT PI
> 
> I prefer Studman's STATIC_IMPORT. The issue here is whether a "static
> import"
> is just an "import" that happens to have a "static" modifier 
> (as when a variable is static),
> or whether it's a new type of thing (in the way that a "static block"
> differs
> from a regular block).

hm, that's what i was asking myself...
will java 1.6 have a "import private", to do away with that
pesky information hiding?
or maybe "import final", for when you're _really_ sure you need
something? "import volatile" when you're not so sure?
oh, sorry i am blathering nonsense.
of course, there will be no java 1.6, they'll call it java 6.0 instead.

> Summary:
> Given that these two both correctly parse Java 1.5 code (which they seem
> to except for the enum problem noted above), choosing one of these to
> be the "official" java.g comes down to which produces a "better" AST.
> I've listed the differences and it looks to me like Studman's AST's
> look like they're more consistent with the ASTs we get today.
> 
> And of course, some guru should look closely at the grammar to make
> sure that it matches the "official" grammar in the JLS, and comments as
> needed, make sure token names are consistent, etc.

i have already checked that my grammar matches the p-f-d (or was it
f-p-d?) of the relevant jsrs that were published in july/august
iirc. excepting some things that could be done in the parser, but
which are better checked in a semantic pass imho.
i hope they haven't changed the syntax yet again since then...

thanks for looking at things :)

michael stahl




 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/
 





More information about the antlr-interest mailing list