[antlr-interest] Inefficiency in lexer

Fri May 27 12:19:34 PDT 2005

Hi again,

I'm not sure if Java is really that smart. I would think that LA(1) is
compared to every character one at a time which would be inefficient.

What would the lookup table look like?

I guess the key in the table would be the char returned by LA(1) and what
would the values be in that table? The memory location where execution
continues?

Then the lookup table would be 2^16*6=393216=390K bytes big which would be
pretty much.

Christian

-----Original Message-----
Message: 7
Date: Fri, 27 May 2005 13:56:02 -0400
From: Bryan Ewbank <ewbank at gmail.com>
Subject: Re: [antlr-interest] Inefficiency in lexer
To: ANTLR Interest <antlr-interest at antlr.org>
Message-ID: <dd3a065f050527105610207ae8 at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Why's this inefficient?  LA is called once in both cases, and the
switch can be converted to a lookup table that is faster than the
multiple comparisons of the alternate code.

- Bryan

On 5/27/05, ttest <ttest at gmx.de> wrote:
> Hi,
> 
> while looking thru my generated lexer code I came across the following
> switch statement which is unnecessarily inefficient.
> 
> switch ( LA(1)) {
> case '\n':  case '\r':  case ' ':  case '0':
> case '1':  case '2':  case '3':  case '4':
> case '5':  case '6':  case '7':  case '8':
> case '9':  case 'A':  case 'B':  case 'C':
> case 'D':  case 'E':  case 'F':  case 'G':
> case 'H':  case 'I':  case 'J':  case 'K':
> case 'L':  case 'M':  case 'N':  case 'O':
> case 'P':  case 'Q':  case 'R':  case 'S':
> case 'T':  case 'U':  case 'V':  case 'W':
> case 'X':  case 'Y':  case 'Z':  case 'a':
> case 'b':  case 'c':  case 'd':  case 'e':
> case 'f':  case 'g':  case 'h':  case 'i':
> case 'j':  case 'k':  case 'l':  case 'm':
> case 'n':  case 'o':  case 'p':  case 'q':
> case 'r':  case 's':  case 't':  case 'u':
> case 'v':  case 'w':  case 'x':  case 'y':
> case 'z':
> {
>         mText(true);
>         theRetToken=_returnToken;
>         break;
> }
> 
> A better alternative which could also be easily generated from character
> classes using .. i. e. 'a'..'z' would be the following.
> 
> char c = LA(1);
> if( c=='\n' || c=='\r' || c==' '
>         || (c>='0' && c<='9')
>         || (c>='A' && c<='Z')
>         || (c>='a' && c<='z')
>         )
> {
>         mText(true);
>         theRetToken=_returnToken;
>         break;
> }
> 
> Greets,
> 
> Christian
> 
>

------------------------------

_______________________________________________
antlr-interest mailing list
antlr-interest at antlr.org
http://www.antlr.org/mailman/listinfo/antlr-interest

End of antlr-interest Digest, Vol 6, Issue 47
*********************************************