[antlr-interest] Re: Local lookahead depth

lgcraymer lgc at mail1.jpl.nasa.gov
Sun Nov 9 22:13:34 PST 2003


--- In antlr-interest at yahoogroups.com, Oliver Zeigermann 
<oliver at z...> wrote:
> Admitted, this is not a really practical example, but consider the 
> following grammar:
> 
> {
>      int cnt = 0;
> }
> 
> LANGUAGE
>      : ( SHORTWORD ) => SHORTWORD { System.out.println("SHORT"); }
>      | LONGWORD { System.out.println("LONG"); }
>      ;
> 
> protected SHORTWORD : { cnt = 0; } ( {cnt < 1000}? '*' { cnt++; } )
+ '#' ;
> protected LONGWORD : { cnt = 0; } ( {cnt < 10000}? '*' { cnt++; } )
+ '#' ;
> 
> It describes a language with two words:
> 1.) SHORTWORD: exactly 1000 '*' followed by a single '#'
> 2.) LONGWORD: exactly 10000 '*' followed by a single '#'

Personally, I prefer:

STAR_WORD
{ int count = 0; }
    :
    ( '*' { count++; } )+ '#'
    { if (count == 1000)
          $setType(SHORTWORD);
      else if (count == 10000)
          $setType(LONGWORD);
    }
}
    ;

and I'd be happier if the setType were not in an action but directly 
supported by ANTLR syntax.


> While the are certainly other grammars that describe this 
language, this 
> one seems to be the most natural, but does not work, because 
semantic 
> predicates (like {cnt < 1000}?) rely on semantic actions ({ 
cnt++; }, { 
> cnt = 0; }).
> 
> > 
> >>2.) Sometimes using tree transformation is too expensive
> > 
> > 
> > Sometimes it is overkill (unnecessary development), but too 
> > expensive?  I doubt it, especially for languages where lexing 
and 
> > parsing are complex.  [BTW, my experience is that 
unsubstantiated 
> > performance arguments are usually bogus and made in an attempt 
to 
> > subjectively win an argument that cannot be won on the basis of 
> > objective evidence.]
> 
> I have the same experience. But consider extremely large amounts 
of 
> input to be parsed. In this case it is prohibitve to generate an 
AST 
> because of the memory issue. As a very practical exmaple I have 
parsing 
> of the AMM (Aircraft Maintenance Manual) which is available in 
SGML 
> (very hard to parse, really). I parsed this a few years using 
ANTLR, but 
> its size normally is around 100MB. A few years ago my machine had 
128MB 
> of RAM! You see what I mean?

And how much disk space did you have?  On a UNIX box, mmap() is a 
good way of automating file I/O, but even on systems without virtual 
memory, you can fake it. Performance is not an issue--with a problem 
of this size, nothing stays in the processor cache, and the overhead 
of the disk writes will be only a few percent.

--Loring


> Oliver


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list