[antlr-interest] V3 grammar writing tactics

Mon Dec 18 12:09:29 PST 2006

Jim,

Thanks for your reply, I am about to try your suggestion of a new input
stream class, I have noticed now that Ter did suggest this approach to
case insensitivity somewhere or other. It will certainly greatly
simplify the grammar.

As regards my original question, it is fairly clear that the problem lay
 in the k=2 vs. k* choice. (Murphy says, pose a properly formulated
question and the answer is obvious!). Accepting the former restriction
brings analysis times back down to seconds again, with no other change
to the grammar. I had already taken the heap setting up as far as I
could, with no result except that the JVM took longer to fall over. As
you said, there are certainly some cases where a loop "close to
infinite" can be produced, outside ANTLRWorks as well as within.

The best news for me was that my grammars, in the state they were in,
did prove to be free from ambiguities and warnings.

BUT, this raises another question - Why should there be any difference
between k-limited and k* behaviour on identical error free grammars? Ter
seems to imply that the look ahead will be limited to that necessary to
eliminate ambiguity, but this cannot be happening in my case. Is it that
the extra look ahead brings so many new possibilities for ambiguity that
a closed loop becomes sufficiently "close to infinite" to exhaust any
usable heap size, as recursion gives memory requirements that are an
N^(some positive number) function?

Anyway, thanks again as I can now progress once more,
Andrew Smith

Jim Idle wrote:
> Andrew,
> 
> A couple of points for you.
> 
> Rather than try to use lexer tokens for case insensitivity, try the code
> below this text. This will use upper case only to recognize tokens, but
> will preserve case in the input stream and is probably what most people
> want to do.
> 
> Perhaps recent grammar analysis changes have bitten you on this one, I
> know some bugs have been fixed and some others tweaked since the release
> you are using, but not exactly sure what.
> 
> Assuming that the analysis will eventually finish and not just loop
> forever (ANTLRWorks does this on certain strange inputs right now), then
> use the command line option: -Xmx1000M  on your java invocation
> (assuming Windows, you may need to consult the java command line for you
> actual system etc).
> 
> Jim
> 
> 
> Use the following where you would normally use an ANTLRFileStream, to
> create a case insensitive lexer which preserves case in the input
> stream:
> 
> import org.antlr.runtime.*;
> import java.io.*;
> 
> /**
>  *
>  * @author jimi
>  */
> public class ANTLRNoCaseFileStream  extends ANTLRFileStream
> {
> 	public ANTLRNoCaseFileStream(String fileName) throws IOException
> {
> 		super(fileName, null);
> 	}
> 
> 	public ANTLRNoCaseFileStream(String fileName, String encoding)
> throws IOException {
> 		super(fileName, encoding);
> 	}
> 	    
> 	public int LA(int i) {
> 		if ( i==0 ) {
> 			return 0; // undefined
> 		}
> 		if ( i<0 ) {
> 			i++; // e.g., translate LA(-1) to use offset 0 
> 		}
> 
> 		if ( (p+i-1) >= n ) {
>             //System.out.println("char LA("+i+")=EOF; p="+p);
>             return CharStream.EOF;
>         }
>         //System.out.println("char LA("+i+")="+data.charAt(p+i-1)+";
> p="+p);
>         return Character.toUpperCase(data[p+i-1]);
>     }
>