[antlr-interest] idea: lexer "sync points"

Thu Feb 7 10:33:08 PST 2008

Hi!

On Feb 7, 2008, at 7:08 PM, J Chapman Flack wrote:

> In some previous thread I glimpsed a statement passing by suggesting
> that ANTLR 3 currently generates a lexer that tokenizes the entire
> input before parsing begins.

That is the out-of-the-box behavior, right.

> 1. Did I hear that right?  (If not, can someone give a quick summary  
> of
>   the right way to understand how an ANTLR 3 lexer really does manage
>   its input character stream and output token stream buffers?  In that
>   case, some of the rest of this message may become moot.)

But there's more to the story:
A lexer is passed the character stream (via a stringstream object or  
similar) and
will ask it for chars.
The lexer itself only knows how to get the next token (via nextToken()  
in interface TokenSource for Java).

It is the fillBuffer method in CommonTokenStream that actually is  
responsible for pulling in all the tokens. If you do now want this  
behavior (maybe because you are reading input from a socket or pipe)  
you could easily provide a different buffered or unbuffered  
TokenStream implementation.

The parser uses the TokenStream interface to get new tokens as it  
needs lookahead information or wants to seek on the stream (or  
whatever you do in actions/sempreds etc).

I think that answers the question, right?

HTH,
-k
-- 
Kay Röpke
http://classdump.org/