[antlr-interest] Re: Local lookahead depth

Sun Nov 9 21:40:21 PST 2003

lgcraymer wrote:
>>>Also, as to actions in lookahead code:  this is something that 
> 
> Ter supported in PCCTS under the name "guarded predicates" or some 
> 
>>>such.  I don't know that it saw much use, and I suspect that 
> 
> usage indicates a too early incorporation of semantic information 
> into the 
> 
>>>translator--tree transformation helps avoid that.
>>
>>1.) You might really increase the set of parseable languages using 
> 
> this 
> 
>>technique
> 
> 
> Possible, but I don't really see how.  You would have to have 
> extreme interdependency between semantics and syntax at the very 
> least.

I had an example that was much more of theoretical interest (I do not 
recall right now, but could look it up). On the more practical side same 
grammars might look nicer if you had such a feature.

Admitted, this is not a really practical example, but consider the 
following grammar:

{
     int cnt = 0;
}

LANGUAGE
     : ( SHORTWORD ) => SHORTWORD { System.out.println("SHORT"); }
     | LONGWORD { System.out.println("LONG"); }
     ;

protected SHORTWORD : { cnt = 0; } ( {cnt < 1000}? '*' { cnt++; } )+ '#' ;
protected LONGWORD : { cnt = 0; } ( {cnt < 10000}? '*' { cnt++; } )+ '#' ;

It describes a language with two words:
1.) SHORTWORD: exactly 1000 '*' followed by a single '#'
2.) LONGWORD: exactly 10000 '*' followed by a single '#'

While the are certainly other grammars that describe this language, this 
one seems to be the most natural, but does not work, because semantic 
predicates (like {cnt < 1000}?) rely on semantic actions ({ cnt++; }, { 
cnt = 0; }).

> 
>>2.) Sometimes using tree transformation is too expensive
> 
> 
> Sometimes it is overkill (unnecessary development), but too 
> expensive?  I doubt it, especially for languages where lexing and 
> parsing are complex.  [BTW, my experience is that unsubstantiated 
> performance arguments are usually bogus and made in an attempt to 
> subjectively win an argument that cannot be won on the basis of 
> objective evidence.]

I have the same experience. But consider extremely large amounts of 
input to be parsed. In this case it is prohibitve to generate an AST 
because of the memory issue. As a very practical exmaple I have parsing 
of the AMM (Aircraft Maintenance Manual) which is available in SGML 
(very hard to parse, really). I parsed this a few years using ANTLR, but 
its size normally is around 100MB. A few years ago my machine had 128MB 
of RAM! You see what I mean?

Oliver

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/