[antlr-interest] Lazy load of CommonTokenStream??

Kay Röpke kroepke at classdump.org
Mon Aug 18 05:29:45 PDT 2008


Moin!

On Aug 18, 2008, at 2:07 PM, Raphael Reitzig wrote:

> Why is LA(0) undefined? One could interpret LA(0) (and LB(0) as  
> well) as looking ahead (back) zero steps, thus finding the current  
> char/token. Would this be a problem of some kind? In fact, I would  
> expect exactly this behaviour, because LA(1) yields the next char to  
> the right and LA(-1) the next char to the left. As it is, I feel  
> it's unnecessarily (?) non-continuous.

The semantics are that a stream is never "on" a specific element, but  
rather "in between". Then LA(-1) is the just consumed element and  
LA(1) is the next one to be looked at. With those semantics LA(0)  
makes no sense, because it's not a valid element. An element (char,  
token or node) is either consumed or not, so I think it makes sense to  
specify it this way.

>> 1) For CharStreams, LA(-1) will return CharStream.EOF. I think  
>> that's at least inconsistent and should return INVALID_CHAR (which  
>> doesn't exist right now), because it's not EOF, technically.
> If you regard looking back as being equivalent to looking forward on  
> a reversed stream, EOF makes sense. There _is_ end of "file", if not  
> the right one.
> I think it is consistent as it is. Looking out of char stream's  
> bounds yields EOF.

Touché :)

> Whats with LA(k), |k|>1? Is this handled properly when looking  
> "past" EOF?

I think LA() is broken for TokenStreams in that it can throw an  
exception for negative indices, because LB() returns null instead of  
EOF_TOKEN. It should not throw in any circumstance and the fix is  
rather trivial especially given that LA(-1) at the beginning of a  
TokenStream will not work right now due to the exception. I don't  
think it could possibly break anything.
For the end of the stream, everything is fine, because the index  
checks in LT return EOF_TOKEN for everything beyond the buffer. The  
beginning is different in that it returns null, causing LA to fail.  
I'd say it should always return a token, that's all.

Note this is only a problem with CommonTokenStream, the CharStream and  
TreeNodeStream implementations do the correct thing, AFAICS (the code  
is really different in both cases).
CommonTreeNodeStream will return EOF when falling off the end of the  
buffer, but INVALID_TOKEN when falling off the beginning.
CharStream (really ANTLRStringStream) will do return CharStream.EOF in  
both cases, but in terms of ints, since there are no tokens just yet.

It's easy to make CommonTokenStream.LA(-1) safe in this regard and I  
think you correctly pointed out that it should return EOF in that  
case. The fix would be to not return null in LB() when the index is  
out of bounds at the lower end.

Interestingly enough, a CommonTreeNodeStream will fill the buffer when  
get() is called, while CommonTokenStream just fails in case the buffer  
is not filled yet. Same thing for size(). Probably best to make the  
implementations do the same thing, because that's what I'd expect as  
an API user.

cheers,
-k
-- 
Kay Röpke
http://classdump.org/








More information about the antlr-interest mailing list