[antlr-interest] Looping Lexer

Stephen Gargan evilsteevil at gmail.com
Mon Jun 11 16:57:55 PDT 2007


I've been seeing a situation with Antlr and Antlrworks (1.0.2) and was
hoping someone could shed some light on it for me. I'm playing around with a
very simple grammar to familiarize myself with the concepts. When I try to
debug this grammar the Debugger fails to connect, the spawned debug process
doesn't terminate and takes over the system.

Having looked at the Antlr code generated by Antlrworks I have seen the
following. The __Test__.java file creates an instance of the lexer, uses the
lexer instance to create a CommonTokenStream and passes the this to the
parser to fill its buffer. At this stage the application goes into an
infinite loop requesting tokens from the lexer. The generated lexer is
broken and will either get stuck lexing a particular token (emitting
nothing) or (with slight modifications to the grammar) repeatedly emit the
same token.

The debug parser will not open its debug socket until after the buffer has
been filled leading to two outcomes. When the lexer repeatedly emits the
same token, the buffer in the Token stream will eventually fill up and an
OutOfMemory exception is generated. The spawned process ends, Antlrwoks will
try to connect to the finished process and eventually will fail.

The other case is worse though as the spawned process is left running,
spinning in its lexing loop, chewing up resources. Antlrworks cannot connect
and the spawned process is left hanging. Antlrworks could possibly mitigate
this by monitoring the spawned process and attempt to kill it if the
connection was unsuccessful.

The question remains though, what is wrong with my grammar that would cause
it to loop like this? It compiles fine, though that is not to say it is
valid ;), I'm new to all of this and would not be surprised if it were my
fault (indeed I'm hoping this is the case). Is there anything obviously
wrong in my grammar that would cause it to loop like this? If so is there
anything I should look for as tell-tale to avoid this problem? Might there
be a way for the lexer to know it is stuck in a loop and complain?

grammar simple;

paths    :     (path)*;
path    :    VOLUME | NAME NEWLINE;
VOLUME    :    'volume' NUMBER;
NAME    :    (WORD? WS?)*;
NUMBER    :     ('0'..'9')+ ;
WORD     :    ('a'..'z'|'A'..'Z'|'_')*;
WS     :     ( '\t' | ' ')+     { $channel = HIDDEN; } ;
NEWLINE :    '\r'? '\n' ;

The following snippet of code runs using the files generated by antlrworks
and gets stuck in the lexer. The test input is

volume1
this is a test volume2

public class LexerTest {

    public static void main(String[] args) throws Exception{
        simpleLexer lex = new simpleLexer(new
ANTLRFileStream("/tmp/antlrworks/__Test___input.txt"));
        System.out.println("Created Lexer");

        Token t = null;
        do{
            t = lex.nextToken();
            System.out.println(t.toString());
        }
        while(t != null);

        System.out.println("done"); // never gets here

    }
}


I am working on reproducing the grammar that caused the repeated emission of
the same token. I'll post again when I have failing this way again. Anyone
know what might be wrong?

Thanks in advance,

Stephen.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070611/49ab70fa/attachment.html 


More information about the antlr-interest mailing list