[antlr-interest] Lazy load of CommonTokenStream??

Mon Aug 18 05:31:55 PDT 2008

> Now when I look at the code, there might be another bug or two  
> lurking. At least minor issues, if they are not bugs.
> Let the stream be positioned on the first token.
> 1) For CharStreams, LA(-1) will return CharStream.EOF. I think  
> that's at least inconsistent and should return INVALID_CHAR (which  
> doesn't exist right now), because it's not EOF, technically.
> 2) For TokenStreams, LA(-1) will throw a NullPointerException,  
> because LB(1) returns null. To be consistent LB should return  
> Token.INVALID_TOKEN, thus causing LA(-1) to return  
> Token.INVALID_TOKEN_TYPE. That way there's no extra check and no  
> exception being thrown, making all calls (except LA(0)) to those  
> methods safe from an exception point of view.
> 3) really minor: the naming scheme for Token.EOF (an int) and  
> Token.INVALID_TOKEN_TYPE (also an int) is slightly off, but it's  
> probably WorldOfPain(tm) to change it, so let's not bother. :)
>
> Opinions?

You might also consider looking back behind the start of the stream as  
an error in the user of the interface. That is, while we do need an  
EOF token to signal that the end of the stream was reached, consumers  
of the token/char stream will always know that there is nothing before  
0, so requesting that is a programming error, and should result in a  
runtime exception. It doesn't make sense anyways, so why bother and  
create a specific return value for that?

I also don't quite know why LA(0) is illegal. I've once implemented my  
own, lazy token stream, and I have to say I was heavily confused on  
the meaning of the various indexes, positions, relative offsets etc.  
in the class. It might just be me, but I somehow found it quite  
confusing and overly complex.

Regards,
Martin