[antlr-interest] CharStream for unbounded input stream

Wed Apr 28 19:20:19 PDT 2010

On Thu, Apr 29, 2010 at 1:47 AM, Steve Drach <drach at itsit.org> wrote:
> Hmm, I think I've discovered it can't be done since the Lexer apparently reads until it's collected all the tokens and then the parser parses them.  Well now that is a bummer.  Anybody have any clever ideas on how to process variable length messages on unbounded input streams?  They are well formed in the sense that they have matching parenthesis (I think).
>

Hi
I think you're wrong on this. Lexer does not need to read all the
tokens upfront, it merely has to respond to Parser requests. It is
just the matter of TokenStream implementation (analyze Parser code,
see where it calls nextToken - if I am not mistaken - from token
source). While it is true that the "default" one buffers all tokens
till EOF, it is just an implementation detail. So probably you need to
implement/use different TokenStream as well (at least that's what I
did). What I can recommend is that, instead of buffering everything,
you should keep small "window" buffer which you will fill with tokens
from lexer - anything will do actually. I'd be extremelly aware of
backtracking in your case. If your parser needs to backtrack then you
may end up with it wanting to go back in token stream beyond what you
keep in your buffer. So you should consider either making ever
expanding buffer which does not forget old tokens, or, examining
maximum backtrack that can occur in your grammar. Good luck, it's
actually quite easy to do.

-- 
Greetings
Marcin Rzeźnicki