[antlr-interest] Lazy load of CommonTokenStream??
Martin Probst
mail at martin-probst.com
Mon Aug 18 05:31:55 PDT 2008
> Now when I look at the code, there might be another bug or two
> lurking. At least minor issues, if they are not bugs.
> Let the stream be positioned on the first token.
> 1) For CharStreams, LA(-1) will return CharStream.EOF. I think
> that's at least inconsistent and should return INVALID_CHAR (which
> doesn't exist right now), because it's not EOF, technically.
> 2) For TokenStreams, LA(-1) will throw a NullPointerException,
> because LB(1) returns null. To be consistent LB should return
> Token.INVALID_TOKEN, thus causing LA(-1) to return
> Token.INVALID_TOKEN_TYPE. That way there's no extra check and no
> exception being thrown, making all calls (except LA(0)) to those
> methods safe from an exception point of view.
> 3) really minor: the naming scheme for Token.EOF (an int) and
> Token.INVALID_TOKEN_TYPE (also an int) is slightly off, but it's
> probably WorldOfPain(tm) to change it, so let's not bother. :)
>
> Opinions?
You might also consider looking back behind the start of the stream as
an error in the user of the interface. That is, while we do need an
EOF token to signal that the end of the stream was reached, consumers
of the token/char stream will always know that there is nothing before
0, so requesting that is a programming error, and should result in a
runtime exception. It doesn't make sense anyways, so why bother and
create a specific return value for that?
I also don't quite know why LA(0) is illegal. I've once implemented my
own, lazy token stream, and I have to say I was heavily confused on
the meaning of the various indexes, positions, relative offsets etc.
in the class. It might just be me, but I somehow found it quite
confusing and overly complex.
Regards,
Martin
More information about the antlr-interest
mailing list