[antlr-interest] performance and token declaration order

Jim Idle jimi at temporal-wave.com
Thu Jul 14 09:45:21 PDT 2011


The only effect that grouping has is in the lexer itself when you find
that you can declare tokens that will be matched in the parser as a set in
a contiguous block in the lexer, which will assign contiguous token
numbers to them. This means that the parser match is much simpler code.
So, in the code below, then parser rule order does not matter, but if you
declare them in a block in the lexer, then the parser will be looking for
a contiguous range and the resulting code will be a simple range check or
an optimal switch, etc.

The most obvious example is to list reserved words in a block followed by
keywords (language words that can also be used as identifier names). Then
you can use a parser rule like this:

id: ID | KEY1 | KEY2 ...

And lexer rules:

// Reserved
//
RES1 : 'RES1';
...

// Keywords
//
KEY1 : 'KEY1' ;
KEY2 : 'KEY2' ;
...

ID : ('A'..'Z')+;  // ID after keywords means that that is contiguous.


You can also keep the token numbers contiguous by using your own
tokenVocab file of course.

Jim



> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Sébastien Kirche
> Sent: Thursday, July 14, 2011 9:19 AM
> To: antlr-interest
> Subject: [antlr-interest] performance and token declaration order
>
> Hi,
>
> considering that the tokens are processed by the parser in the order
> they are listed in the grammar, and looking at the generated code, does
> it make sense to list the alternatives in the order of higher to lower
> frequency ?
>
> For example, considering the following rule :
> dataType
> 	: Any
> 	| Blob
> 	| Boolean
> 	| Byte
> 	| Char
> 	| DateTime
> 	| Date
> 	| Dec
> 	| Double
> 	| Int
> 	| LongLong
> 	| Long
> 	| Real
> 	| String
> 	| Time
> 	| UInt
> 	| ULong ;
>
> I have put the different types in the order they are listed in the
> language help file. But while knowing that I have far more longs,
> integers and strings than bytes or dates (and theoretically no Reals
> for example), should I move the most used types at the beginning ? I
> did not found an answer in the FAQ yet.
>
> Regards.
> --
> Sébastien Kirche
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address


More information about the antlr-interest mailing list