[antlr-interest] how to let parser control lexer state.

Sat Apr 28 07:23:33 PDT 2007

femtowin1 wrote:

> Hi all, in antlr3, can parser control lexer state
> and decide how lexer lexing? some grammar has ambiguity
> decided upon by parser knowledge.
>   for ruby grammar <<
> x << 1
> test
> 1
> if x is a variable, then << is shift operator,
> otherwise it is a heredoc. so lexing must know
> from the symbol table whether x has been defined
> beforehand. But current antlrv3 implementation,
> lexer lexing to a constant token stream, and feed
> it into parser, so can't achieve this effect.

What you need is a TokenStream class that does not tokenize and buffer the
complete input stream, but calls nextToken() only on demand.
I built such a TokenStream for the Python target, because the SGML parser I
am working on has similar problems. My version seems to work for me so far,
but it may break, if the parser needs to look ahead too far (the parser
would request k tokens, which are parsed in state X, then consume n<k
tokens and change lexer state to Y - then token n+1 was tokenized in state
X, but should have been in Y). If that cannot happen, then you're rather
lucky. Otherwise the TokenStream would probably have to keep track of the
lexer state for each token and rewind the input stream and re-tokenize with
a different state, if needed. 
If you want, I can send you my 'LazyTokenStream.py'. I may help you to
implement the corresponding Java version.

-- 
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://pink.odahoda.de/