[antlr-interest] Re: Local lookahead depth
lgcraymer
lgc at mail1.jpl.nasa.gov
Sun Nov 9 22:13:34 PST 2003
--- In antlr-interest at yahoogroups.com, Oliver Zeigermann
<oliver at z...> wrote:
> Admitted, this is not a really practical example, but consider the
> following grammar:
>
> {
> int cnt = 0;
> }
>
> LANGUAGE
> : ( SHORTWORD ) => SHORTWORD { System.out.println("SHORT"); }
> | LONGWORD { System.out.println("LONG"); }
> ;
>
> protected SHORTWORD : { cnt = 0; } ( {cnt < 1000}? '*' { cnt++; } )
+ '#' ;
> protected LONGWORD : { cnt = 0; } ( {cnt < 10000}? '*' { cnt++; } )
+ '#' ;
>
> It describes a language with two words:
> 1.) SHORTWORD: exactly 1000 '*' followed by a single '#'
> 2.) LONGWORD: exactly 10000 '*' followed by a single '#'
Personally, I prefer:
STAR_WORD
{ int count = 0; }
:
( '*' { count++; } )+ '#'
{ if (count == 1000)
$setType(SHORTWORD);
else if (count == 10000)
$setType(LONGWORD);
}
}
;
and I'd be happier if the setType were not in an action but directly
supported by ANTLR syntax.
> While the are certainly other grammars that describe this
language, this
> one seems to be the most natural, but does not work, because
semantic
> predicates (like {cnt < 1000}?) rely on semantic actions ({
cnt++; }, {
> cnt = 0; }).
>
> >
> >>2.) Sometimes using tree transformation is too expensive
> >
> >
> > Sometimes it is overkill (unnecessary development), but too
> > expensive? I doubt it, especially for languages where lexing
and
> > parsing are complex. [BTW, my experience is that
unsubstantiated
> > performance arguments are usually bogus and made in an attempt
to
> > subjectively win an argument that cannot be won on the basis of
> > objective evidence.]
>
> I have the same experience. But consider extremely large amounts
of
> input to be parsed. In this case it is prohibitve to generate an
AST
> because of the memory issue. As a very practical exmaple I have
parsing
> of the AMM (Aircraft Maintenance Manual) which is available in
SGML
> (very hard to parse, really). I parsed this a few years using
ANTLR, but
> its size normally is around 100MB. A few years ago my machine had
128MB
> of RAM! You see what I mean?
And how much disk space did you have? On a UNIX box, mmap() is a
good way of automating file I/O, but even on systems without virtual
memory, you can fake it. Performance is not an issue--with a problem
of this size, nothing stays in the processor cache, and the overhead
of the disk writes will be only a few percent.
--Loring
> Oliver
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list