[antlr-interest] Limiting tokens selected by lexer in antlr3

Mon Jul 16 14:46:02 PDT 2007

On Monday 16 July 2007 14:06, Frederick N. Brier wrote:
> I created a simple parsing grammar, but two of the tokens utilize
> slightly different alphanumeric letters, but one should only be
> selected within one set of parser rules and the rest of the time, the
> other should be generated.

You must think of lexical analysis as a separate (earlier) phase of 
processing your input ('cause it is). Lexical analysis is not driven in 
a top-down process and has no memory of state or context other than 
that implied by each individual lexer rule (after incorporating any 
fragments it references).

If you understand the difference between a finite-state automaton and a 
push-down automaton, then just realize that lexical analysis is defined 
by regular expressions and hence has only the recognizing capability of 
a FSA (and in ANTLR is realized by a deterministic finite-state 
automaton or DFA) while the parser uses a context-free grammar to 
specify the language it accepts and hence has the recognition 
characteristics of a push-down automaton.

(ANTLR also has fancy disambiguation capabilities, but none of them help 
you with your problem of thinking of the parser as pulling tokens from 
the lexer.)

> I am using antlr3.  I have read about 
> filters and trees, and ASTs, but have gotten pretty confused.  How do
> I from a parser rule, tell the lexer to pick one rule over another? 

You do not. It just doesn't work that way. There is no communication 
from the parser to the lexical analyzer. Information flows only from 
the input text into the lexical analyzer and thence to the parser.

> When I exit the rule, it should go back to generating the other
> token.

Again, this is not a proper model of the process of recognizing an input 
text using an ANTLR-generated lexical analyzer and parser.

> Please feel free to point me at any documentation or examples 
> you know of.  Thank you.

You're going to have to rethink how you're analyzing your inputs so that 
the lexical analyzer has only the task of partitioning that input into 
tokens without regard for the LL grammar rules that will recognize 
valid sequences over that token set.

> Fred

Randall Schulz