[antlr-interest] C++, Rick, Not optimal Lexer's code.

Thu Oct 28 06:41:03 PDT 2004

(just back from holidays, working through loads and loads of mail (and spam
:(( )

On Tue, Oct 19, 2004 at 09:47:08AM +0300, Ruslan Zasukhin wrote:
> I think I see not optimal code in the nexttoken.
> It looks in this way:
>
> antlr::RefToken VSQL_Lexer::nextToken()
> {
>     antlr::RefToken theRetToken;
>     for (;;) {
>         antlr::RefToken theRetToken;
>         int _ttype = antlr::Token::INVALID_TYPE;
>         resetText();
>         try {   // for lexical and char stream error handling
>
>             switch ( LA(1)) {   <<<<<<<<<<<<<<<<<<< this is good.
>
>             case static_cast<unsigned char>('('):
>             {
>                 mLPAREN(true);
>                 theRetToken=_returnToken;
>                 break;
>             }
>             default:
>
>                 else if ((LA(1) == static_cast<unsigned char>('-')) &&
> (LA(2) == static_cast<unsigned char>('>'))) {
>                     mPTR(true);
>                     theRetToken=_returnToken;
>                 }
>                 else if ((LA(1) == static_cast<unsigned char>('<')) &&
> (LA(2) == static_cast<unsigned char>('>'))) {
>                     mNE(true);
>                     theRetToken=_returnToken;
>                 }
>                 else if ((LA(1) == static_cast<unsigned char>('>')) &&
> (LA(2) == static_cast<unsigned char>('='))) {
>                     mGE(true);
>                     theRetToken=_returnToken;
>                 }
>                 else if ((LA(1) == static_cast<unsigned char>('<')) &&
> (LA(2) == static_cast<unsigned char>('='))) {
>                     mLE(true);
>                     theRetToken=_returnToken;
>                 }
>             }
>
> -----------------------------------------------------------------
> I think that in default clause we also must use switch

Good point. Not sure but maybe this is also affected by the
genswitchthreshold options (or what's their name, see options.html in the
docs). I'm not sure if it's easy to tweak the 2.7.x codegenerator to do
this, it might it might not.

> In this way code become more natural and clean.
> But the main is that we now call LA(1) only ONCE instead of 30-40 times (in
> my case).

I once had a prototype that cached the LA() calls, was nice up to some
point but it didn't work for all the cases without going into really
serious changes in the codegenerator (that would probably affect the other
codegenerators as well). Tinkering with the structure of the 2.7
codgenerators is hairy business (not saying that 3.0 will be perfect, it
will probably suffer from some imposed structure as well but it's much and
much easier to tinker with the code generated). After tinkering with the
3.0 codegen I'd rather spent time on the 3.0 prototype then on doing
optimizations on 2.7. The code generated by the 3.0 C prototype looks
really good and I got an even better one than I initially build for the
workshop (less portable though due to using mmap for input files).

Of course patches are welcome ;)

Cheers,

Ric
--
-----+++++*****************************************************+++++++++-------
    ---- Ric Klaren ----- j.klaren at utwente.nl ----- +31 53 4893755  ----
-----+++++*****************************************************+++++++++-------
  "Of all the things I've lost I miss my mind the most --- Ozzy Osbourne

Yahoo! Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
    antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
    http://docs.yahoo.com/info/terms/