[antlr-interest] Limiting tokens selected by lexer in antlr3
Randall R Schulz
rschulz at sonic.net
Mon Jul 16 14:46:02 PDT 2007
On Monday 16 July 2007 14:06, Frederick N. Brier wrote:
> I created a simple parsing grammar, but two of the tokens utilize
> slightly different alphanumeric letters, but one should only be
> selected within one set of parser rules and the rest of the time, the
> other should be generated.
You must think of lexical analysis as a separate (earlier) phase of
processing your input ('cause it is). Lexical analysis is not driven in
a top-down process and has no memory of state or context other than
that implied by each individual lexer rule (after incorporating any
fragments it references).
If you understand the difference between a finite-state automaton and a
push-down automaton, then just realize that lexical analysis is defined
by regular expressions and hence has only the recognizing capability of
a FSA (and in ANTLR is realized by a deterministic finite-state
automaton or DFA) while the parser uses a context-free grammar to
specify the language it accepts and hence has the recognition
characteristics of a push-down automaton.
(ANTLR also has fancy disambiguation capabilities, but none of them help
you with your problem of thinking of the parser as pulling tokens from
the lexer.)
> I am using antlr3. I have read about
> filters and trees, and ASTs, but have gotten pretty confused. How do
> I from a parser rule, tell the lexer to pick one rule over another?
You do not. It just doesn't work that way. There is no communication
from the parser to the lexical analyzer. Information flows only from
the input text into the lexical analyzer and thence to the parser.
> When I exit the rule, it should go back to generating the other
> token.
Again, this is not a proper model of the process of recognizing an input
text using an ANTLR-generated lexical analyzer and parser.
> Please feel free to point me at any documentation or examples
> you know of. Thank you.
You're going to have to rethink how you're analyzing your inputs so that
the lexical analyzer has only the task of partitioning that input into
tokens without regard for the LL grammar rules that will recognize
valid sequences over that token set.
> Fred
Randall Schulz
More information about the antlr-interest
mailing list