[antlr-interest] Understanding lookahead

Wed Jun 6 10:46:56 PDT 2007

I'm trying to understand how ANTLR's lookahead mechanisms work using  
this grammar:

   grammar Simple;

   FOO: BAR ':' BAZ {System.out.println("FOO");};
   fragment BAR: 'bar' {System.out.println("BAR");};
   fragment BAZ: 'baz' {System.out.println("BAZ");};
   EVERYTHING_ELSE: . {System.out.println("EVERYTHING_ELSE");};

   thing: .* EOF {System.out.println("done");};

I basically wanted to explore the way lookahead works and what ANTLR  
does when its lookahead predictions fail. For example, given the  
following inputs:

- "bar:baz": recognizes this as a FOO token
- "bar:ba": predicts FOO and complains about missing "z"
- "bar:b": predicts FOO and complains about missing "a"
- "bar:": predicts FOO and complains about missing "b"
- "bar": predicts FOO and complains about missing ":"
- "ba": predicts FOO and complains about missing "r"
- "b": accepts input as EVERYTHING_ELSE
- "...ba": accepts the periods as EVERYTHING_ELSE, then predicts FOO  
complains about missing "r"

This exercise was very helpful for me in seeing how ANTLR's lookahead  
operates: basically, as soon as its seen enough input to predict the  
presence of a particular token ("ba" is enough in this case), it  
assumes that it really is that token, scans ahead, and raises an  
exception if its expectations aren't met.

So, one way to get this grammar to handle strings like "...ba"  
without throwing exceptions is to use the filter=true option. I'm  
curious to know, however, is there any other way?

Cheers,
Wincent