[antlr-interest] Early exit exception in positive closures

Sam Harwell sharwell at pixelminegames.com
Tue Mar 10 06:58:02 PDT 2009


When at all possible, avoid using semantic predicates, as they will
greatly slow down your grammar relative to a properly left-factored
grammar. If the negated set works, it is probably your best option for
now. Maybe I can add a subtract set operation you can test, although I
couldn't check it in officially since it deviates from the grammar
language spec. Then you could write the following with no predicates (no
predicates with short lookahead is optimal):

// the keyword rule must be a set for this to work, which is "any single
one of the following tokens"
(identifier | keyword-(KW123|KW456))+ rule_with_keywords?

You never responded to my offer to check out making the ANTLR Tool
faster at analyzing your grammar. 20 seconds is a long time and I'm sure
it can be improved. :)

Sam

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Andreas Meyer
Sent: Tuesday, March 10, 2009 6:46 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Early exit exception in positive closures

Hi!

I have a question regarding predicates and positive closures like 
({pred}? rule)+. I want to use the predicate in order to disambiguate 
things like :

  (identifier|keyword)+ rule_with_keywords?

Previously, I solved this by manually assembling a FIRST-set of 
rule_with_keywords and putting this into a negated syntactic predicate:

  ((~(KW123|KW456))=>(identifier|keyword))+ rule_with_keywords?

Now, I asked myself: why not use {false}? as predicate, instead? It's 
only used when the actual input contains an ambiguity, so it would (at 
runtime) prefer to exit the subrule, which is exactly what I want. 
However, it seems that the predicate also serves as something like a 
check _before_ even doing the first iteration. So, for an input like 
"KW123 KW123" the generated parser complains about an "early exit 
exception" when I use {false}?.

Would it make sense to add an option to ANTLR like 
"do_not_check_for_early_exit"? This would greatly simplify my grammar, 
as rules like these occur very often. Sure, this also would work:

  (identifier|keyword) ({false}? (identifier|keyword))*
rule_with_keywords?

but of course it's more redundant.

Greetings,
Andreas Meyer


--------------

grammar Lang;

options
{
output=AST;
}

@members {
    public boolean match( String rulename ) {
        boolean didMatch = false;
        int mark = input.mark();
        try {
            java.lang.reflect.Method m = this.getClass().getMethod( 
rulename );
            state.backtracking ++;
            m.invoke( this );
            didMatch = true;
        }
        catch( NoSuchMethodException e )
        {
            e.printStackTrace();
            didMatch = false;
        }
        catch( IllegalAccessException e )
        {
            e.printStackTrace();
            didMatch = false;
        }
        catch (java.lang.reflect.InvocationTargetException e) {
            if( e.getCause() instanceof RecognitionException )
                didMatch = false;
        }
        state.backtracking --;
        input.rewind (mark);
        return didMatch;   
    }
}

start
    : (stmt '.')+ EOF
    ;


// this works, because at the first occurrence of atom, the "predicate" 
is not evaluated ...
//stmt    : KW_3 atom ( {false}? atom)* option?
//    ;

// however, here, the  generated code tries to disambiguate, although 
there is no need to,
// because '+' tells that at least one atom is wanted
stmt
    : KW_3 ( {false}? atom)+ option?
    ;

//stmt    : KW_3 ( {!match("option")}? atom)+ option?
//    ;


option
    : KW_1^ KW_2
    | KW_1^
    | KW_2^
    ;    


atom
    : ID
    | keyword
    ;
   

keyword
    : KW_1
    | KW_2
    | KW_3
    | KW_4
    ;

KW_1 : 'kw1';
KW_2 : 'kw2';
KW_3 : 'kw3';
KW_4 : 'kw4';   

ID: ('a'..'z' | 'A'..'Z' )+ ;
WS : (' '|'\n'|'\r'|'\t') {$channel=HIDDEN;} ;

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list