[antlr-interest] Why does ANTLR generate code that will never call an OR'd alternative?
Kevin J. Cummings
cummings at kjchome.homeip.net
Sat Aug 21 07:24:24 PDT 2010
On 08/21/2010 04:00 AM, Avid Trober wrote:
> Kevin,
>
> Thanks for taking the time to reply.
>
> I did have the predicate in the identifier rule, but it appears the wrong
> way:
>
> identifier
> : {isToken(input.LT(1))}? IDENTIFIER | IDENTIFIER;
Why can't you something like do:
identifier: i:IDENTIFIER
{ if (isToken($i))
{ // code here for the isToken case
}
else
{ // code here (maybe empty) for the other case
}
}
;
> The above still produced code that would never call isToken. The reason I
> did it like above, I thought the predicate had to change the token type
> (from the tokens section value to IDENTIFIER); therefore, the IDENTIFIER
> after the predicate.
>
> Per your email, I tried this:
>
> identifier
> : {isToken(input.LT(1))}? | IDENTIFIER;
This case won't match anything, so in order for isToken to be called,
the lookahead would have to *not* be an IDENTIFIER.
> And, ANTLR generated code that would call isToken. But, isToken could also
> be called on the right side of the OR in the 'identifier' rule (see code
> below).
> But, worse:
>
> 1. The identifier rule doesn't work in the above form. I get unexpected
> token exceptions for using a tokens section token as what's meant to be
> non-grammar keywords.
>
> 2. Check out this first "if" for a simple list of tokens...some checks are
> for the value of the token (e.g. TOKEN1, TOKEN10) and others are for values
> range checks (e.g. (LA30_0 >= TOKEN2 && LA30_0 <= TOKEN3). The latter I
> could understand, if it weren't for the fact TOKEN2 and TOKEN3 values are 5
> and 6!
>
>
> if ( (LA30_0 == TOKEN1 || (LA30_0 >= TOKEN2 && LA30_0 <= TOKEN3)
> || (LA30_0 >= TOKEN4 && LA30_0 <= TOKEN5) || (LA30_0 >= TOKEN6 && LA30_0 <=
> TOKEN7) || (LA30_0 >= TOKEN8 && LA30_0 <= TOKEN9) || LA30_0 == TOKEN10 ||
> LA30_0 == TOKEN11 || (LA30_0 >= TOKEN12 && LA30_0 <= TOKEN13)) )
> {
> alt30 = 1;
> }
> else if ( (LA30_0 == IDENTIFIER) )
> {
> int LA30_2 = input.LA(2);
>
> if ( ((isToken(input.LT(1)))) )
> {
> alt30 = 1;
> }
> else if ( (true) )
> {
> alt30 = 2;
> }
> else
> {
> NoViableAltException nvae_d30s2 =
> new NoViableAltException("", 30, 2, input);
>
> throw nvae_d30s2;
> }
> }
> else
> {
> NoViableAltException nvae_d30s0 =
> new NoViableAltException("", 30, 0, input);
>
> throw nvae_d30s0;
> }
> switch (alt30)
> {
> case 1 :
> // ... : {...}?
> {
> root_0 = (object)adaptor.GetNilNode();
>
> if ( !((isToken(input.LT(1)))) )
> {
> throw new FailedPredicateException(input,
> "identifier", "isToken(input.LT(1))");
> }
>
> }
> break;
> case 2 :
> // ... : IDENTIFIER
> {
> root_0 = (object)adaptor.GetNilNode();
>
>
> IDENTIFIER132=(IToken)Match(input,IDENTIFIER,FOLLOW_IDENTIFIER_in_identifier
> 1562);
> IDENTIFIER132_tree =
> (object)adaptor.Create(IDENTIFIER132);
> adaptor.AddChild(root_0,
> IDENTIFIER132_tree);
>
>
> }
> break;
>
> }
>
>
> The only form of the 'identifier' rule I got to work was this:
>
> identifier
> :
> ( 'TOKEN1'
> | 'TOKEN2'
> | 'TOKEN3'
> ...
> | 'TOKEN_ZILLION') { input.LT(-1).Type = IDENTIFIER; }
> | IDENTIFIER;
>
>
> Now, I can use a tokens keyword in a way the parser won't throw an
> exception:
>
> TOKEN1=TOKEN3
>
> And, 'TOKEN3' doesn't trip up the parser.
> (For the above, the rule is:
>
> TOKEN1=identifier
>
> Which never worked before if the right-side of the equal sign was a token in
> the tokens section).
In cases like this, I have done:
keyword : 'TOKEN1'
| 'TOKEN2'
| 'TOKEN3'
...
| 'LAST_TOKEN'
;
identifier : IDENTIFIER
| k:keyword
{ #k->setType(IDENITIFER); }
;
(OK, this is with ANTLR 2.7.7 and the C++ target...) but it should be
similar with ANTLR 3.
> I don't like my solution, listing the tokens twice in the grammar file.
> And, would love to know how a pro would solve it. Initially, if I
> should/must taken all the tokens out of the tokens section and, perhaps,
> make per-token rules for them???
I wouldn't use a semantic predicate for this, rather, I'd just clobber
the token type when I knew it was an identifier and not a keyword.
This question comes up rather often on this list.
> Regards,
> Trober
--
Kevin J. Cummings
kjchome at rcn.com
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)
More information about the antlr-interest
mailing list