[antlr-interest] enums in v4 ANTLR Java code generation considered useless

Wed May 19 18:25:54 PDT 2010

I suspect that your benchmark runs afoul of clock granularity issues for the JIT. If you run it a few times you will likely get different results. Also you say 10% better for enums but look at your results again. 

Take the client JIT, your first run gives:

Enum Time: 25707993
Int Time : 28520406

So enum is slightly better, but your second run gives:

Enum Time: 34060167
Int Time : 24820249

And Int time in this run is superior to your enum time by a far greater margin than the reverse in the first run. Your server shows a similar disparity. You have to run for much longer times and repeat many times, then average out because the JIT does not always make the same decision. 

Unless there is something about your print outs that I am missing?

Finally, I would not trust 64 bit openjdk as far as I can throw my house :-)

Finally, finally, you need to look at switch() performance really, and as ANTLR will (does if you set the -X options to the same values as I use in the C generator) use them. There tend to be a fair number of switch cases with some further embedded switches. 

The C optimizer will murder those but the Java JIT has some opportunity to reorder the case at runtime and theoretically it could do better than the C compiler for some use cases. It rarely does though because of other overheads and the fact that most real world applications don't exhibit a polarization to one or two oft used cases out of many. You can see that ANTLR generated code would only do this if out of many alts, just one or two were taken a lot (which would depend on the language being parsed).

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Kirby Bohling
> Sent: Wednesday, May 19, 2010 4:29 PM
> To: Scott Stanchfield
> Cc: antlr-interest interest
> Subject: Re: [antlr-interest] enums in v4 ANTLR Java code generation
> considered useless
> 
> On topic, I think the only important decision to make is from an API
> perspective, while one can go "tweak" the generator, going from int's
> to enums would change the API.  I'd suggest just deciding which one
> you want to support.  Enums are definitely nicer from that
> perspective.  Given the below performance benchmarks, and just how
> much of ANTLR's output is really just a series of "if/else" or switch
> blocks buried inside of a huge number of loops, I actually do think
> you'd spot the difference.
> 
> Moving well off-topic, but since you said to, I did just what you
> suggested:
> 
> Using my personal laptop running Fedora 11 using x86_64 for the kernel
> and JVM:
> $ java -version
> java version "1.6.0_18"
> OpenJDK Runtime Environment (IcedTea6 1.8) (fedora-35.b18.fc11-x86_64)
> OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)
> 
> Both CPU's are Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz w/ 3MB
> cache.
> 
> These aren't spectacular benchmarks from an accuracy perspective, but
> illustrate that assuming ints and enums have identical performance
> characteristics in all cases is an invalid assumption:
> 
> Using java -Xint Foo:
> Enum Time: 516121334
> Int Time : 424748884
> Enum Time: 514078841
> Int Time : 423574161
> 
> ~21% performance hit to use enums with HotSpot disabled, (similar to
> the DalikVM because it has minimal JIT as of right now, which I'm
> guessing why the original article suggested you stay away from them
> near performance critical areas).
> 
> Using: java -client Foo
> Enum Time: 25707993
> Int Time : 28520406
> Enum Time: 34060167
> Int Time : 24820249
> 
> ~10% speed up for using enums.
> 
> Using: java -server Foo
> Enum Time: 25543589
> Int Time : 28637110
> Enum Time: 32887612
> Int Time : 28968574
> 
> Again ~10% speed up for using enums.
> 
> So there might actually be a reason to support Enum's internally from
> a speed/performance perspective if the non-JIT case is considered
> negligible.  I thought they'd match your claim in this case.  Didn't
> have any reason to actually think enums would be faster then int's.
> 
> -- Sample code:
> 
> public class Foo {
> 
>     private static long MAX = 10000000;
> 
>     public static void main(String[] args) {
>         doEnums();
>         doInts();
>         doEnums();
>         doInts();
>     }
> 
>     public static void doInts() {
>         int val = 0;
>         long start = System.nanoTime();
>         for (long iii = 0; iii < MAX; ++iii) {
>             if (0 == val) {
>                 val = 1;
>             } else if (1 == val) {
>                 val = 0;
>             }
>         }
>         long end = System.nanoTime();
>         System.out.println("Int Time : " + (end - start));
>     }
> 
>     enum Parity { EVEN, ODD };
>     public static void doEnums() {
>         Parity val = Parity.EVEN;
>         long start = System.nanoTime();
>         for (long iii = 0; iii < MAX; ++iii) {
>             if (Parity.EVEN == val) {
>                 val = Parity.ODD;
>             } else if (Parity.ODD == val) {
>                 val = Parity.EVEN;
>             }
>         }
>         long end = System.nanoTime();
>         System.out.println("Enum Time: " + (end - start));
>     }
> 
> }
> 
> 
> On Wed, May 19, 2010 at 3:30 PM, Scott Stanchfield <scott at javadude.com>
> wrote:
> > Don't pre-optimize for things like this. Profile, then optimize. This
> > won't even show up as an issue.
> >
> > I think whoever wrote that page was daydreaming about any minor way
> > performance might be increased - note that they don't talk at all on
> > that page about the big performance issues (I/O, networking, etc),
> > though I do like that they talk about limiting object creation.
> >
> > With the example they show on that android dev page, you'll never
> > see/feel the difference. And their example on grabbing the ordinal
> > value so you don't need to lookup a static field is really silly. If
> > they just want to avoid looking up the static field everytime through
> > the loop, don't do:
> >
> >     int valX = MyEnum.VAL_X.ordinal();
> >    int valY = MyEnum.VAL_Y.ordinal();
> >    int count = list.size();
> >    MyItem items = list.items();
> >    for (int  n = 0; n < count; n++)   {
> >        int  valItem = items[n].e.ordinal();
> >        if (valItem == valX)
> >            // do stuff 1
> >        else if (valItem == valY)
> >            // do stuff 2
> >    }
> >
> > instead do
> >
> >    MyEnum valX = MyEnum.VAL_X;
> >    MyEnum valY = MyEnum.VAL_Y;
> >    int count = list.size();
> >    MyItem items = list.items();
> >    for (int  n = 0; n < count; n++)   {
> >        MyEnum valItem = items[n].e;
> >        if (valItem == valX)
> >            // do stuff 1
> >        else if (valItem == valY)
> >            // do stuff 2
> >    }
> >
> > Stuff like that makes me think whoever wrote that really didn't think
> > it through all the way. The pointer comparison is the same expense as
> > the int comparison and avoids n+2 calls to ordinal() in their example
> > code.
> >
> > Moreso, the suggestion to use constants that the compiler will inline
> > is truly evil. Compiler constant inlining can very easily lead to
> > incorrect constant values when a library (that provides a constant)
> > changes (new jar dropped in with a new value for the constant) but
> the
> > code using that library isn't recompiled. Safety issue.
> >
> > If this becomes an issue (which I doubt it will), someone can always
> > extend the code generator to tweak it.
> > -- Scott
> >
> > ----------------------------------------
> > Scott Stanchfield
> > http://javadude.com
> >
> >
> >
> > On Wed, May 19, 2010 at 3:59 PM, Kirby Bohling
> <kirby.bohling at gmail.com> wrote:
> >> On Wed, May 19, 2010 at 2:13 PM, Scott Stanchfield
> <scott at javadude.com> wrote:
> >>> Interesting point re common code generation approaches, but as far
> as
> >>> performance goes, it's equivalent - all == tests are done using
> >>> pointers, which are the same size as ints. If switch is used the
> >>> ordinal values of the enums are used, and the java compiler may be
> >>> able to better optimize which switch bytecode is used b/c it knows
> the
> >>> exact possible range of values.
> >>
> >> That's true of most full scale JVMs with good JIT, but for many
> >> embedded VM's that isn't true.  See the Dalvik VM for Android.
> >>
> >> This link for instance:
> >>
> http://developer.android.com/guide/practices/design/performance.html#av
> oid_enums
> >>
> >> I believe it is becoming less true as time goes along, but from what
> I
> >> know right now it is true.
> >>
> >> If you can't support generating both, I'd agree with Jim Idle
> support
> >> the one that will go everywhere.  If however you could treat it like
> >> the C target does with using switch vs. if/else, I'd think that'd be
> >> nifty.  Doubly so because maintenance burden is free when somebody
> >> else is doing the work.  As this affects the external API, I would
> >> assume that it's a non-option to generate one or the other.
> >>
> >>
> >>>
> >>> I'd much rather use enums where available, though. I'd think any
> code
> >>> generator could generate a simple int equivalent where enums don't
> >>> exist, though. The only "gotcha" would be if we had the
> >>> pattern/description properties, which would have to be represented
> as
> >>> separate arrays in most languages. They aren't necessary though
> (but
> >>> I'd love to have them)
> >>> -- Scott
> >>>
> >>> ----------------------------------------
> >>> Scott Stanchfield
> >>> http://javadude.com
> >>>
> >>>
> >>>
> >>> On Wed, May 19, 2010 at 3:04 PM, Jim Idle <jimi at temporal-wave.com>
> wrote:
> >>>> I also have doubts about the performance characteristics and the
> possibility of starting to rely on the target language to fill in gaps
> such as token numbering - we could get to the point where code
> generators cannot be built for more primitive languages because the
> schema is relying the language to automatically do things.
> >>>>
> >>>> The generated code should be as primitive as possible, with the
> runtime being as maintainable and clear as possible while not
> sacrificing performance.
> >>>>
> >>>> Jim
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> >>>>> bounces at antlr.org] On Behalf Of Terence Parr
> >>>>> Sent: Wednesday, May 19, 2010 11:35 AM
> >>>>> To: antlr-interest interest
> >>>>> Subject: Re: [antlr-interest] enums in v4 ANTLR Java code
> generation
> >>>>> considered useless
> >>>>>
> >>>>>
> >>>>> On May 18, 2010, at 2:58 PM, Scott Stanchfield wrote:
> >>>>>
> >>>>> > There are several advantages to enums:
> >>>>> > * there is a discrete set of values that can be used (no
> accidental
> >>>>> > 42's passed in when 42 isn't a token type)
> >>>>> > * the enum value can carry extra information
> >>>>> > * the enum values can override methods differently
> >>>>>
> >>>>> These are all excellent advantages. I believe that these mostly
> apply
> >>>>> when you're writing code, not generating. Just like the compiler
> >>>>> generates integers underneath, if antlr is generating integers,
> it's
> >>>>> probably okay.
> >>>>>
> >>>>> > OH - one of the things that's clouding this is that you really
> don't
> >>>>> > need the numeric type identifers anymore. You can just have
> >>>>> >
> >>>>> >  public enum TokenType {
> >>>>> >    IDENT, INT ...;
> >>>>> >  }
> >>>>> >
> >>>>> > then in your match method:
> >>>>> >
> >>>>> >  void match(TokenType type) {
> >>>>> >    if (LA(1).getType() == type) {
> >>>>> >        ...
> >>>>> >    }
> >>>>> >  }
> >>>>>
> >>>>> The only problem is that match() lives up in the superclass in
> the
> >>>>> library but the generated parser needs to define the enum.
> >>>>>
> >>>>> I also have the problem that I need to merge token types from
> multiple
> >>>>> grammars for grammar imports. This gets more competition with
> enum
> >>>>> types without inheritance.
> >>>>>
> >>>>> >
> >>>>> > And you can use the types in a switch statement:
> >>>>> >
> >>>>> >  switch(type) {
> >>>>> >    case INT:
> >>>>> >    case IDENT:
> >>>>> >    ...
> >>>>> >  }
> >>>>> >
> >>>>> > No more magic numbers! Woohoo!
> >>>>>
> >>>>> ANTLR already uses the labels when possible such as INT. If you
> use a
> >>>>> literal in your grammar such as ';' in don't label it in the
> lexer,
> >>>>> than I had no choice but to generate the integer token type or a
> weird
> >>>>> label like TOKEN34.
> >>>>>
> >>>>> Ter
> >>>>>
> >>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> interest/your-
> >>>>> email-address
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> interest/your-email-address
> >>>>
> >>>
> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> interest/your-email-address
> >>>
> >>
> >
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address