[antlr-interest] Early exit exception in positive closures
Sam Harwell
sharwell at pixelminegames.com
Tue Mar 10 06:58:02 PDT 2009
When at all possible, avoid using semantic predicates, as they will
greatly slow down your grammar relative to a properly left-factored
grammar. If the negated set works, it is probably your best option for
now. Maybe I can add a subtract set operation you can test, although I
couldn't check it in officially since it deviates from the grammar
language spec. Then you could write the following with no predicates (no
predicates with short lookahead is optimal):
// the keyword rule must be a set for this to work, which is "any single
one of the following tokens"
(identifier | keyword-(KW123|KW456))+ rule_with_keywords?
You never responded to my offer to check out making the ANTLR Tool
faster at analyzing your grammar. 20 seconds is a long time and I'm sure
it can be improved. :)
Sam
-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Andreas Meyer
Sent: Tuesday, March 10, 2009 6:46 AM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Early exit exception in positive closures
Hi!
I have a question regarding predicates and positive closures like
({pred}? rule)+. I want to use the predicate in order to disambiguate
things like :
(identifier|keyword)+ rule_with_keywords?
Previously, I solved this by manually assembling a FIRST-set of
rule_with_keywords and putting this into a negated syntactic predicate:
((~(KW123|KW456))=>(identifier|keyword))+ rule_with_keywords?
Now, I asked myself: why not use {false}? as predicate, instead? It's
only used when the actual input contains an ambiguity, so it would (at
runtime) prefer to exit the subrule, which is exactly what I want.
However, it seems that the predicate also serves as something like a
check _before_ even doing the first iteration. So, for an input like
"KW123 KW123" the generated parser complains about an "early exit
exception" when I use {false}?.
Would it make sense to add an option to ANTLR like
"do_not_check_for_early_exit"? This would greatly simplify my grammar,
as rules like these occur very often. Sure, this also would work:
(identifier|keyword) ({false}? (identifier|keyword))*
rule_with_keywords?
but of course it's more redundant.
Greetings,
Andreas Meyer
--------------
grammar Lang;
options
{
output=AST;
}
@members {
public boolean match( String rulename ) {
boolean didMatch = false;
int mark = input.mark();
try {
java.lang.reflect.Method m = this.getClass().getMethod(
rulename );
state.backtracking ++;
m.invoke( this );
didMatch = true;
}
catch( NoSuchMethodException e )
{
e.printStackTrace();
didMatch = false;
}
catch( IllegalAccessException e )
{
e.printStackTrace();
didMatch = false;
}
catch (java.lang.reflect.InvocationTargetException e) {
if( e.getCause() instanceof RecognitionException )
didMatch = false;
}
state.backtracking --;
input.rewind (mark);
return didMatch;
}
}
start
: (stmt '.')+ EOF
;
// this works, because at the first occurrence of atom, the "predicate"
is not evaluated ...
//stmt : KW_3 atom ( {false}? atom)* option?
// ;
// however, here, the generated code tries to disambiguate, although
there is no need to,
// because '+' tells that at least one atom is wanted
stmt
: KW_3 ( {false}? atom)+ option?
;
//stmt : KW_3 ( {!match("option")}? atom)+ option?
// ;
option
: KW_1^ KW_2
| KW_1^
| KW_2^
;
atom
: ID
| keyword
;
keyword
: KW_1
| KW_2
| KW_3
| KW_4
;
KW_1 : 'kw1';
KW_2 : 'kw2';
KW_3 : 'kw3';
KW_4 : 'kw4';
ID: ('a'..'z' | 'A'..'Z' )+ ;
WS : (' '|'\n'|'\r'|'\t') {$channel=HIDDEN;} ;
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
More information about the antlr-interest
mailing list