[antlr-interest] enums in v4 ANTLR Java code generation considered useless

Wed May 19 16:29:07 PDT 2010

On topic, I think the only important decision to make is from an API
perspective, while one can go "tweak" the generator, going from int's
to enums would change the API.  I'd suggest just deciding which one
you want to support.  Enums are definitely nicer from that
perspective.  Given the below performance benchmarks, and just how
much of ANTLR's output is really just a series of "if/else" or switch
blocks buried inside of a huge number of loops, I actually do think
you'd spot the difference.

Moving well off-topic, but since you said to, I did just what you suggested:

Using my personal laptop running Fedora 11 using x86_64 for the kernel and JVM:
$ java -version
java version "1.6.0_18"
OpenJDK Runtime Environment (IcedTea6 1.8) (fedora-35.b18.fc11-x86_64)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)

Both CPU's are Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz w/ 3MB cache.

These aren't spectacular benchmarks from an accuracy perspective, but
illustrate that assuming ints and enums have identical performance
characteristics in all cases is an invalid assumption:

Using java -Xint Foo:
Enum Time: 516121334
Int Time : 424748884
Enum Time: 514078841
Int Time : 423574161

~21% performance hit to use enums with HotSpot disabled, (similar to
the DalikVM because it has minimal JIT as of right now, which I'm
guessing why the original article suggested you stay away from them
near performance critical areas).

Using: java -client Foo
Enum Time: 25707993
Int Time : 28520406
Enum Time: 34060167
Int Time : 24820249

~10% speed up for using enums.

Using: java -server Foo
Enum Time: 25543589
Int Time : 28637110
Enum Time: 32887612
Int Time : 28968574

Again ~10% speed up for using enums.

So there might actually be a reason to support Enum's internally from
a speed/performance perspective if the non-JIT case is considered
negligible.  I thought they'd match your claim in this case.  Didn't
have any reason to actually think enums would be faster then int's.

-- Sample code:

public class Foo {

    private static long MAX = 10000000;

    public static void main(String[] args) {
        doEnums();
        doInts();
        doEnums();
        doInts();
    }

    public static void doInts() {
        int val = 0;
        long start = System.nanoTime();
        for (long iii = 0; iii < MAX; ++iii) {
            if (0 == val) {
                val = 1;
            } else if (1 == val) {
                val = 0;
            }
        }
        long end = System.nanoTime();
        System.out.println("Int Time : " + (end - start));
    }

    enum Parity { EVEN, ODD };
    public static void doEnums() {
        Parity val = Parity.EVEN;
        long start = System.nanoTime();
        for (long iii = 0; iii < MAX; ++iii) {
            if (Parity.EVEN == val) {
                val = Parity.ODD;
            } else if (Parity.ODD == val) {
                val = Parity.EVEN;
            }
        }
        long end = System.nanoTime();
        System.out.println("Enum Time: " + (end - start));
    }

}

On Wed, May 19, 2010 at 3:30 PM, Scott Stanchfield <scott at javadude.com> wrote:
> Don't pre-optimize for things like this. Profile, then optimize. This
> won't even show up as an issue.
>
> I think whoever wrote that page was daydreaming about any minor way
> performance might be increased - note that they don't talk at all on
> that page about the big performance issues (I/O, networking, etc),
> though I do like that they talk about limiting object creation.
>
> With the example they show on that android dev page, you'll never
> see/feel the difference. And their example on grabbing the ordinal
> value so you don't need to lookup a static field is really silly. If
> they just want to avoid looking up the static field everytime through
> the loop, don't do:
>
>     int valX = MyEnum.VAL_X.ordinal();
>    int valY = MyEnum.VAL_Y.ordinal();
>    int count = list.size();
>    MyItem items = list.items();
>    for (int  n = 0; n < count; n++)   {
>        int  valItem = items[n].e.ordinal();
>        if (valItem == valX)
>            // do stuff 1
>        else if (valItem == valY)
>            // do stuff 2
>    }
>
> instead do
>
>    MyEnum valX = MyEnum.VAL_X;
>    MyEnum valY = MyEnum.VAL_Y;
>    int count = list.size();
>    MyItem items = list.items();
>    for (int  n = 0; n < count; n++)   {
>        MyEnum valItem = items[n].e;
>        if (valItem == valX)
>            // do stuff 1
>        else if (valItem == valY)
>            // do stuff 2
>    }
>
> Stuff like that makes me think whoever wrote that really didn't think
> it through all the way. The pointer comparison is the same expense as
> the int comparison and avoids n+2 calls to ordinal() in their example
> code.
>
> Moreso, the suggestion to use constants that the compiler will inline
> is truly evil. Compiler constant inlining can very easily lead to
> incorrect constant values when a library (that provides a constant)
> changes (new jar dropped in with a new value for the constant) but the
> code using that library isn't recompiled. Safety issue.
>
> If this becomes an issue (which I doubt it will), someone can always
> extend the code generator to tweak it.
> -- Scott
>
> ----------------------------------------
> Scott Stanchfield
> http://javadude.com
>
>
>
> On Wed, May 19, 2010 at 3:59 PM, Kirby Bohling <kirby.bohling at gmail.com> wrote:
>> On Wed, May 19, 2010 at 2:13 PM, Scott Stanchfield <scott at javadude.com> wrote:
>>> Interesting point re common code generation approaches, but as far as
>>> performance goes, it's equivalent - all == tests are done using
>>> pointers, which are the same size as ints. If switch is used the
>>> ordinal values of the enums are used, and the java compiler may be
>>> able to better optimize which switch bytecode is used b/c it knows the
>>> exact possible range of values.
>>
>> That's true of most full scale JVMs with good JIT, but for many
>> embedded VM's that isn't true.  See the Dalvik VM for Android.
>>
>> This link for instance:
>> http://developer.android.com/guide/practices/design/performance.html#avoid_enums
>>
>> I believe it is becoming less true as time goes along, but from what I
>> know right now it is true.
>>
>> If you can't support generating both, I'd agree with Jim Idle support
>> the one that will go everywhere.  If however you could treat it like
>> the C target does with using switch vs. if/else, I'd think that'd be
>> nifty.  Doubly so because maintenance burden is free when somebody
>> else is doing the work.  As this affects the external API, I would
>> assume that it's a non-option to generate one or the other.
>>
>>
>>>
>>> I'd much rather use enums where available, though. I'd think any code
>>> generator could generate a simple int equivalent where enums don't
>>> exist, though. The only "gotcha" would be if we had the
>>> pattern/description properties, which would have to be represented as
>>> separate arrays in most languages. They aren't necessary though (but
>>> I'd love to have them)
>>> -- Scott
>>>
>>> ----------------------------------------
>>> Scott Stanchfield
>>> http://javadude.com
>>>
>>>
>>>
>>> On Wed, May 19, 2010 at 3:04 PM, Jim Idle <jimi at temporal-wave.com> wrote:
>>>> I also have doubts about the performance characteristics and the possibility of starting to rely on the target language to fill in gaps such as token numbering - we could get to the point where code generators cannot be built for more primitive languages because the schema is relying the language to automatically do things.
>>>>
>>>> The generated code should be as primitive as possible, with the runtime being as maintainable and clear as possible while not sacrificing performance.
>>>>
>>>> Jim
>>>>
>>>>> -----Original Message-----
>>>>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>>>>> bounces at antlr.org] On Behalf Of Terence Parr
>>>>> Sent: Wednesday, May 19, 2010 11:35 AM
>>>>> To: antlr-interest interest
>>>>> Subject: Re: [antlr-interest] enums in v4 ANTLR Java code generation
>>>>> considered useless
>>>>>
>>>>>
>>>>> On May 18, 2010, at 2:58 PM, Scott Stanchfield wrote:
>>>>>
>>>>> > There are several advantages to enums:
>>>>> > * there is a discrete set of values that can be used (no accidental
>>>>> > 42's passed in when 42 isn't a token type)
>>>>> > * the enum value can carry extra information
>>>>> > * the enum values can override methods differently
>>>>>
>>>>> These are all excellent advantages. I believe that these mostly apply
>>>>> when you're writing code, not generating. Just like the compiler
>>>>> generates integers underneath, if antlr is generating integers, it's
>>>>> probably okay.
>>>>>
>>>>> > OH - one of the things that's clouding this is that you really don't
>>>>> > need the numeric type identifers anymore. You can just have
>>>>> >
>>>>> >  public enum TokenType {
>>>>> >    IDENT, INT ...;
>>>>> >  }
>>>>> >
>>>>> > then in your match method:
>>>>> >
>>>>> >  void match(TokenType type) {
>>>>> >    if (LA(1).getType() == type) {
>>>>> >        ...
>>>>> >    }
>>>>> >  }
>>>>>
>>>>> The only problem is that match() lives up in the superclass in the
>>>>> library but the generated parser needs to define the enum.
>>>>>
>>>>> I also have the problem that I need to merge token types from multiple
>>>>> grammars for grammar imports. This gets more competition with enum
>>>>> types without inheritance.
>>>>>
>>>>> >
>>>>> > And you can use the types in a switch statement:
>>>>> >
>>>>> >  switch(type) {
>>>>> >    case INT:
>>>>> >    case IDENT:
>>>>> >    ...
>>>>> >  }
>>>>> >
>>>>> > No more magic numbers! Woohoo!
>>>>>
>>>>> ANTLR already uses the labels when possible such as INT. If you use a
>>>>> literal in your grammar such as ';' in don't label it in the lexer,
>>>>> than I had no choice but to generate the integer token type or a weird
>>>>> label like TOKEN34.
>>>>>
>>>>> Ter
>>>>>
>>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>>>>> email-address
>>>>
>>>>
>>>>
>>>>
>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>>
>>>
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>
>>
>