[antlr-interest] Lazy load of CommonTokenStream??
kroepke at classdump.org
Mon Aug 18 05:29:45 PDT 2008
On Aug 18, 2008, at 2:07 PM, Raphael Reitzig wrote:
> Why is LA(0) undefined? One could interpret LA(0) (and LB(0) as
> well) as looking ahead (back) zero steps, thus finding the current
> char/token. Would this be a problem of some kind? In fact, I would
> expect exactly this behaviour, because LA(1) yields the next char to
> the right and LA(-1) the next char to the left. As it is, I feel
> it's unnecessarily (?) non-continuous.
The semantics are that a stream is never "on" a specific element, but
rather "in between". Then LA(-1) is the just consumed element and
LA(1) is the next one to be looked at. With those semantics LA(0)
makes no sense, because it's not a valid element. An element (char,
token or node) is either consumed or not, so I think it makes sense to
specify it this way.
>> 1) For CharStreams, LA(-1) will return CharStream.EOF. I think
>> that's at least inconsistent and should return INVALID_CHAR (which
>> doesn't exist right now), because it's not EOF, technically.
> If you regard looking back as being equivalent to looking forward on
> a reversed stream, EOF makes sense. There _is_ end of "file", if not
> the right one.
> I think it is consistent as it is. Looking out of char stream's
> bounds yields EOF.
> Whats with LA(k), |k|>1? Is this handled properly when looking
> "past" EOF?
I think LA() is broken for TokenStreams in that it can throw an
exception for negative indices, because LB() returns null instead of
EOF_TOKEN. It should not throw in any circumstance and the fix is
rather trivial especially given that LA(-1) at the beginning of a
TokenStream will not work right now due to the exception. I don't
think it could possibly break anything.
For the end of the stream, everything is fine, because the index
checks in LT return EOF_TOKEN for everything beyond the buffer. The
beginning is different in that it returns null, causing LA to fail.
I'd say it should always return a token, that's all.
Note this is only a problem with CommonTokenStream, the CharStream and
TreeNodeStream implementations do the correct thing, AFAICS (the code
is really different in both cases).
CommonTreeNodeStream will return EOF when falling off the end of the
buffer, but INVALID_TOKEN when falling off the beginning.
CharStream (really ANTLRStringStream) will do return CharStream.EOF in
both cases, but in terms of ints, since there are no tokens just yet.
It's easy to make CommonTokenStream.LA(-1) safe in this regard and I
think you correctly pointed out that it should return EOF in that
case. The fix would be to not return null in LB() when the index is
out of bounds at the lower end.
Interestingly enough, a CommonTreeNodeStream will fill the buffer when
get() is called, while CommonTokenStream just fails in case the buffer
is not filled yet. Same thing for size(). Probably best to make the
implementations do the same thing, because that's what I'd expect as
an API user.
More information about the antlr-interest