[antlr-interest] enums in v4 ANTLR Java code generation considered useless

Kirby Bohling kirby.bohling at gmail.com
Wed May 19 12:59:17 PDT 2010


On Wed, May 19, 2010 at 2:13 PM, Scott Stanchfield <scott at javadude.com> wrote:
> Interesting point re common code generation approaches, but as far as
> performance goes, it's equivalent - all == tests are done using
> pointers, which are the same size as ints. If switch is used the
> ordinal values of the enums are used, and the java compiler may be
> able to better optimize which switch bytecode is used b/c it knows the
> exact possible range of values.

That's true of most full scale JVMs with good JIT, but for many
embedded VM's that isn't true.  See the Dalvik VM for Android.

This link for instance:
http://developer.android.com/guide/practices/design/performance.html#avoid_enums

I believe it is becoming less true as time goes along, but from what I
know right now it is true.

If you can't support generating both, I'd agree with Jim Idle support
the one that will go everywhere.  If however you could treat it like
the C target does with using switch vs. if/else, I'd think that'd be
nifty.  Doubly so because maintenance burden is free when somebody
else is doing the work.  As this affects the external API, I would
assume that it's a non-option to generate one or the other.


>
> I'd much rather use enums where available, though. I'd think any code
> generator could generate a simple int equivalent where enums don't
> exist, though. The only "gotcha" would be if we had the
> pattern/description properties, which would have to be represented as
> separate arrays in most languages. They aren't necessary though (but
> I'd love to have them)
> -- Scott
>
> ----------------------------------------
> Scott Stanchfield
> http://javadude.com
>
>
>
> On Wed, May 19, 2010 at 3:04 PM, Jim Idle <jimi at temporal-wave.com> wrote:
>> I also have doubts about the performance characteristics and the possibility of starting to rely on the target language to fill in gaps such as token numbering - we could get to the point where code generators cannot be built for more primitive languages because the schema is relying the language to automatically do things.
>>
>> The generated code should be as primitive as possible, with the runtime being as maintainable and clear as possible while not sacrificing performance.
>>
>> Jim
>>
>>> -----Original Message-----
>>> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
>>> bounces at antlr.org] On Behalf Of Terence Parr
>>> Sent: Wednesday, May 19, 2010 11:35 AM
>>> To: antlr-interest interest
>>> Subject: Re: [antlr-interest] enums in v4 ANTLR Java code generation
>>> considered useless
>>>
>>>
>>> On May 18, 2010, at 2:58 PM, Scott Stanchfield wrote:
>>>
>>> > There are several advantages to enums:
>>> > * there is a discrete set of values that can be used (no accidental
>>> > 42's passed in when 42 isn't a token type)
>>> > * the enum value can carry extra information
>>> > * the enum values can override methods differently
>>>
>>> These are all excellent advantages. I believe that these mostly apply
>>> when you're writing code, not generating. Just like the compiler
>>> generates integers underneath, if antlr is generating integers, it's
>>> probably okay.
>>>
>>> > OH - one of the things that's clouding this is that you really don't
>>> > need the numeric type identifers anymore. You can just have
>>> >
>>> >  public enum TokenType {
>>> >    IDENT, INT ...;
>>> >  }
>>> >
>>> > then in your match method:
>>> >
>>> >  void match(TokenType type) {
>>> >    if (LA(1).getType() == type) {
>>> >        ...
>>> >    }
>>> >  }
>>>
>>> The only problem is that match() lives up in the superclass in the
>>> library but the generated parser needs to define the enum.
>>>
>>> I also have the problem that I need to merge token types from multiple
>>> grammars for grammar imports. This gets more competition with enum
>>> types without inheritance.
>>>
>>> >
>>> > And you can use the types in a switch statement:
>>> >
>>> >  switch(type) {
>>> >    case INT:
>>> >    case IDENT:
>>> >    ...
>>> >  }
>>> >
>>> > No more magic numbers! Woohoo!
>>>
>>> ANTLR already uses the labels when possible such as INT. If you use a
>>> literal in your grammar such as ';' in don't label it in the lexer,
>>> than I had no choice but to generate the integer token type or a weird
>>> label like TOKEN34.
>>>
>>> Ter
>>>
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
>>> email-address
>>
>>
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list