[antlr-interest] Error recovery in lexer (~ unclosed string)

Thibaut Colar tcolar at colar.net
Tue Jun 30 17:53:47 PDT 2009


Hello there.

In my grammar file, i have a lexer rule to match DSL string like this:
DSL        :'<|' ( options {greedy=false;} : . )* '|>' ;

Now my Lexer is hooked up int the NetBeans IDE to parse(well lexer only 
for now) files and that works well.
The problem is that if you type an oening <| (unclosed at this time), 
the lexer errors out.
What happens is that it goes all the way to the end of the file trying 
to find the closing |>, hits the EOF and throws an exception (null 
nextToken / unmatched tokens left).

I'm not surprised it doesn't like this, but i want to add some error 
recovery so it deos not fail.
I've tried many thing in last 24h but can't seem to find something that 
works.

Latest try is basically this (using states):
Grammar:
DSL        :'<|' {state=INCOMPLETE_DSL} ( options {greedy=false;} : . )* 
'|>' {state=NORMAL};

Lexer: ------------------------
    public Token<FanTokenID> nextToken()
    {
        curToken = (CommonToken) lexer.nextToken();
    if(curToken.getType()==-1) //prob. EOF
    {
        int state = lexer.getState();
        switch (state)
        {
        case FanStates.INCOMPLETE_DSL:
            curToken.setType(FanLexer.INCOMPLETE_DSL); // set as 
incomplete token type
            break;
        }
        lexer.clearState();
    }
    }
------------------------

However that still does not work quite right, it does replace the token 
correctly, but i guess after that it does not continue at the right 
place (do i need a rewind() or consume() or something ?)

Or maybe i'm doing this the wrong way - should i "emit" a fake closing 
token(|>) instead ?

I'm sure this is a pretty common issue but i couldn't find any 
documented way to do this in antlr3 in the book or online.

Any help would be greatly appreciated, Thanks.



More information about the antlr-interest mailing list