[antlr-interest] Bug in DFA matching?

Mon Feb 9 15:14:21 PST 2009

On Mon, Feb 9, 2009 at 3:12 PM, Gavin Lambert <antlr at mirality.co.nz> wrote:
> At 08:55 10/02/2009, C. Scott Ananian wrote:
>>// whitespace at start of line used for INDENT processing
>>INITIAL_WS
>>       : {getCharPositionInLine()==1 && !afterIndent}? // at start of
>>line.
>>       ( ' ' | TAB )*
>>    { this.afterIndent=true; }
>>    ;
>>
>>Note the star in the INITIAL_WS rule, which means that *every*
>>line should emit an INITIAL_WS token, possibly matching nothing,
>>before matching anything else.
>
> You must never do that.  If a lexer rule can ever match nothing, then it can
> always match nothing, and will therefore produce an infinite number of
> matching-nothing tokens, causing an infinite loop (until you run out of
> memory).  Top-level lexer rules must always match at least one character.

I think you misunderstood me, or misread the grammer.  It matches
nothing *at the beginning of the line* and then afterIndent is set to
false, and it doesn't match any more.
That's the intended behavior.  It worked in ANTLRv2; it seems to be a
regression in ANTLRv3.
 --scott

-- 
                         ( http://cscott.net/ )