[antlr-interest] What to expect next

Wed Feb 27 11:16:48 PST 2008

Dear list,

I am a contributor to an eclipse plugin 
(http://sourceforge.net/projects/quantum). I am responsible for the SQL 
Editor and want to increase user experience by offering content assist. 
We use antlr 2.7.x to check valid syntax. That works reasonably fine, 
but I would like to have more information on incomplete statements.

Using actions I have divided a SQL statement into several parts and have 
managed to have content assist based on that. But this is not as 
fine-grained as I want it to be. What I would really like is that the 
syntax check would tell us what it expects next: a column definition,  a 
keyword, a table and so on. The content assist would then be able to 
show just those assists. The problem is of course that the syntax that 
the user is entering is wrong, so will cause an error. Somehow I do not 
get as many  "expected <this> found 'null'" errors as I expected (if you 
get my expectation;-)).

I have added exception handling around my entry rule, hoping to achieve 
that if one statement does not compile, the next one might. So I consume 
everything up to and including the next SEMI. I think that my 
implementation somehow hides the errors I want to catch.

sql_script:
     sql_stmt 
(s:SEMICOLON!{currentStatement.setLength(#s.getColumn()-currentStatement.getOffset());} 
(sql_stmt)?)*
    {
        ##=#([SQL_SCRIPT,"script"], #sql_script);
    }
;
exception
catch [MismatchedTokenException mce]
{
     QError e = new QError();
   // .... omitted for brevity
    errors.add(e);
    if(LA(1)==SEMICOLON)
    {
        consume();
        returnAST = sql_script_AST;
        sql_stmt();
        return;
    }
    consume();
    while (LA(1) != Token.EOF_TYPE && (LA(1)!=SEMICOLON)) {
        consume();
    }
    if(LA(1)==SEMICOLON){
        consume();
    }
    returnAST = sql_script_AST;
    sql_stmt();
    return;
}
catch [NoViableAltException nvae]
{
    QError e = new QError();
   // .... omitted for brevity
    errors.add(e);
      if(LA(1)==SEMICOLON)
    {
        consume();
        sql_stmt();
        return;
    }
    consume();
    while (LA(1) != Token.EOF_TYPE && (LA(1)!=SEMICOLON)) {
        consume();
    }
    if(LA(1)==SEMICOLON){
        consume();
    }
    if(LA(1)!=EOF){
        sql_script();
    }
    return;
}
catch [RecognitionException re]
{
    QError e = new QError();
   // .... omitted for brevity
    errors.add(e);
    if(LA(1)==SEMICOLON)
    {
        consume();
        returnAST = sql_script_AST;
        sql_stmt();
        return;
    }
    consume();
    while (LA(1) != Token.EOF_TYPE && (LA(1)!=SEMICOLON)) {
        consume();
    }
    if(LA(1)==SEMICOLON){
        consume();
    }
    returnAST = sql_script_AST;
    sql_stmt();
    return;
}

I have a similar exception handling block around the sql_stmt rule. (The 
stuff I am doing to the returnAST is also sub-optimal.)
sql_stmt
:
      sql_data_stmt 
{statements.put(currentStatement.getStatementNumber(), 
currentStatement);}// this statement had correct syntax.
    | sql_schema_stmt 
{statements.put(currentStatement.getStatementNumber(), currentStatement);}
    | sql_transaction_stmt
    |
    ( options {generateAmbigWarnings=false;}:
         // Keeping this order avoids the clash of the "set" statements
     // due to the linear approximation of the lookahead
        sql_session_stmt     // LA(1) is surely "set"
      | sql_connection_stmt
    )
    | sql_dyn_stmt
    | system_descriptor_stmt
    | get_diag_stmt
    | declare_cursor
    | temporary_table_decl
;
exception
catch [MismatchedTokenException mce]
{
... see above...

So my questions are:
1) Can antlr offer a list of things it expects next? So if the statement 
is: SELECT * FROM, can antlr then tell me it expects a table definition. 
Whether that would be a table, view or subquery depends on the grammar, 
I think. I would like all three alternatives, and the grammar for 
completed statements does support all three.
2) In what order should I catch the exceptions? I want as much info on 
the error as possible. Do I need one catch [Exception e] and then handle 
the subtypes in the catch block? Which rule should I call in the 
exception block: sql_script or sql_stmt?
3) Am I taking the correct approach here?

Any suggestions would be greatly appreciated.

Kind regards,

Jan