[antlr-interest] Inefficiency in lexer

Bryan Ewbank ewbank at gmail.com
Fri May 27 10:56:02 PDT 2005


Why's this inefficient?  LA is called once in both cases, and the
switch can be converted to a lookup table that is faster than the
multiple comparisons of the alternate code.

- Bryan

On 5/27/05, ttest <ttest at gmx.de> wrote:
> Hi,
> 
> while looking thru my generated lexer code I came across the following
> switch statement which is unnecessarily inefficient.
> 
> switch ( LA(1)) {
> case '\n':  case '\r':  case ' ':  case '0':
> case '1':  case '2':  case '3':  case '4':
> case '5':  case '6':  case '7':  case '8':
> case '9':  case 'A':  case 'B':  case 'C':
> case 'D':  case 'E':  case 'F':  case 'G':
> case 'H':  case 'I':  case 'J':  case 'K':
> case 'L':  case 'M':  case 'N':  case 'O':
> case 'P':  case 'Q':  case 'R':  case 'S':
> case 'T':  case 'U':  case 'V':  case 'W':
> case 'X':  case 'Y':  case 'Z':  case 'a':
> case 'b':  case 'c':  case 'd':  case 'e':
> case 'f':  case 'g':  case 'h':  case 'i':
> case 'j':  case 'k':  case 'l':  case 'm':
> case 'n':  case 'o':  case 'p':  case 'q':
> case 'r':  case 's':  case 't':  case 'u':
> case 'v':  case 'w':  case 'x':  case 'y':
> case 'z':
> {
>         mText(true);
>         theRetToken=_returnToken;
>         break;
> }
> 
> A better alternative which could also be easily generated from character
> classes using .. i. e. 'a'..'z' would be the following.
> 
> char c = LA(1);
> if( c=='\n' || c=='\r' || c==' '
>         || (c>='0' && c<='9')
>         || (c>='A' && c<='Z')
>         || (c>='a' && c<='z')
>         )
> {
>         mText(true);
>         theRetToken=_returnToken;
>         break;
> }
> 
> Greets,
> 
> Christian
> 
>


More information about the antlr-interest mailing list