[antlr-interest] Why does ANTLR generate code that will never call an OR'd alternative?

Sat Aug 21 00:41:44 PDT 2010

On 08/21/2010 03:27 AM, Avid Trober wrote:
> Gerald,
> 
> Thank you very much for your reply.
> 
> There's no alt skipped message in the error log.
> 
> The 'isToken' rule was simply my attempt to have the parser check if the
> token was in the tokens { ... } section.  At runtime, I found the token type
> to always be the value in the token { ... } section, even if I tried to
> change it:
> 
> 	isToken	:	{isToken(input.LT(1))}? IDENTIFIER;
> 
> But, 'isToken' would never get called via the generated code, e.g. 
> 
> 	identifier  :  isToken | IDENTIFIER;   // i.e. treat a token in the
> tokens section as an IDENTIFIER.

You need to move your semantic predicate.  The lookahead sees that
IDENTIFIER is the lookahead for both.  If you want it to go through
isToken, your need to move the semantic predicate to the "identifier" rule.

> Therefore, I modified my 'identifier' rule to have each tokens { ... } value
> in it, e.g.
> 
> 	identifier:
> 		( 'TOKEN1', 'TOKEN2', ... 'TOKEN_ELEVENTYTEEN_THOUSAND' }  {
> input.LT(-1).Type = IDENTIFIER; }
> 		| IDENTIFIER;
> 
> And,  that worked.  That is, if I have "identifier" in the grammar somewhere
> it will now accept an IDENTIFIER, as it always has, but also any 'TOKEN1',
> 'TOKEN2', etc. value found in tokens { ... }
> 
> Personally, I hate this.  It means I need *two* places in my grammar to list
> the keywords, the tokens { ... } section AND the identifier rule.  I'm sure
> there's some way to do it via an action, predicate, whatever.  
> 
> I went down this path due to this recommendation: " The author's
> recommendation is to use ordinary rules and the tokens command." at
> http://www.antlr.org/wiki/display/ANTLR3/Quick+Starter+on+Parser+Grammars+-+
> No+Past+Experience+Required. 
> 
> It appears the tokens section is NOT the thing to do, perhaps rather to have
> per-token rules, e.g. keyToken1, keyToken2, etc.  But, I can't rewrite this
> grammar and risk breaking other things.  Perhaps I should in the future.
> Preferably, I simply like a way to scan thru the tokens, if found, note it,
> then change the token type to IDENTIFIER - without listing all the tokens
> twice in the grammar.
> 
> Any suggestions very, very welcome. 
> 
> Regards,
> Trober
> 
> 
> 
> 
> -----Original Message-----
> From: Gerald Rosenberg [mailto:gerald at certiv.net] 
> Sent: Saturday, August 21, 2010 1:35 AM
> To: Avid Trober
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Why does ANTLR generate code that will never
> call an OR'd alternative?
> 
>   Most likely, the parser generation analysis determined that isToken 
> can never be reached.  Check your error log for an alt skipped message.
> 
> 
> 
> ------ Original Message (Saturday, August 21, 2010 1:01:20 
> AM) From: Avid Trober ------
> Subject: [antlr-interest] Why does ANTLR generate code that will never call
> an OR'd alternative?
>> For this rule,
>>
>>
>>
>> identifier
>>
>>                  :       isToken | IDENTIFIER;
>>
>>
>>
>> ANTLR generates code that would never calls the isToken rule
>> (target=CSharp2):
>>
>>
>>
>>      public MYParser.identifier_return identifier()    // throws
>> RecognitionException [1]
>>
>>      {
>>
>> .
>>
>>              // .  : ( isToken | IDENTIFIER )
>>
>>              int alt30 = 2;
>>
>>              int LA30_0 = input.LA(1);
>>
>>
>>
>>              if ( (LA30_0 == IDENTIFIER) )   //<== token must be
> IDENTIFIER
>> to call isToken???
>>
>>              {
>>
>>                  int LA30_1 = input.LA(2);
>>
>>
>>
>>                  if ( ((isToken(input.LT(1)))) )  //<== why must LA30_0 ==
>> IDENTIFIER to call isToken?
>>
>>                  {
>>
>>                      alt30 = 1;
>>
>>                  }
>>
>>                  else if ( (true) )
>>
>>                  {
>>
>>                      alt30 = 2;
>>
>>                  }
>>
>> .
>>
>>              else                         //<== since not IDENTIFIER, why
>> not call isToken here???
>>
>>              {
>>
>>                  NoViableAltException nvae_d30s0 =
>>
>>                      new NoViableAltException("", 30, 0, input);
>>
>>
>>
>>                  throw nvae_d30s0;
>>
>>              }
>>
>>
>>
>> I would think it's something to do with DFA optimization?   Perhaps that's
>> why IDENTIFIER is checked first.
>>
>> But, if IDENTIFIER is false, why not call isToken???    Afterall, the rule
>> is IDENTIFIER  ****OR***** isToken.
>>
>>
>>
>> Thanks,
>>
>> Trober
>>
>>
>>
>>
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
> 
> 

-- 
Kevin J. Cummings
kjchome at rcn.com
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)