[antlr-interest] Semantic predicate behaviour with k>1

A Z asicaddress at gmail.com
Fri Oct 15 15:40:12 PDT 2010


Hello,

  I am seeing ANTLR generate unexpected code when using semantic predicates
and am wondering if my grammar or understanding is incorrect. The EBNF has a
rule similar to the following:

rule :
    primary_literal
  | {isIdent(LT(1)->getText(LT(1)),PARAM_IDENT)}?     identifier LBRACKET?
  | {isIdent(LT(1)->getText(LT(1)),SPECPARAM_IDENT)}? identifier (LBRACKET
constant_range_expression RBRACKET)?
  | {isIdent(LT(1)->getText(LT(1)),TYPE_IDENT)}?      identifier APOSTROPHE
  | {isIdent(LT(1)->getText(LT(1)),ENUM_IDENT)}?      identifier
  | {isIdent(LT(1)->getText(LT(1)),GENVAR_IDENT)}?    identifier
  | {isIdent(LT(1)->getText(LT(1)),LET_IDENT)}?       identifier LPARAN?
  | {isIdent(LT(1)->getText(LT(1)),GENBLOCK_IDENT)}?  identifier (LBRACKET
constant_expression RBRACKET)? PERIOD
  | {isIdent(LT(1)->getText(LT(1)),PACKAGE_IDENT)}?   identifier COLONCOLON
constant_primary_package_scope_suffix
  | identifier ((LPARAN list_of_arguments RPARAN)=> LPARAN list_of_arguments
RPARAN)?// tf_call

The last identifier type can be forward declared so that type is assumed if
the identifier at this point is undefined. I previously had tried doing this
by factoring but it makes the grammar very difficult to follow and
substantially increases the number of rules.  With this rule ANTLR generates
the following:

                else if ( (LA1039_0 == SIMPLE_IDENT) )
                {

                    {
                        int LA1039_2 = LA(2);
                        if ( (LA1039_2 == LBRACKET || LA1039_2 == PERIOD) )
                        {
                            alt1039=8;
                        }
                        else if ( (LA1039_2 == APOSTROPHE) )
                        {
                            alt1039=4;
                        }
                        else if ( (LA1039_2 == COLONCOLON) )
                        {
                            alt1039=9;
                        }
                        else if (
((isIdent(LT(1)->getText(LT(1)),PARAM_IDENT))) )
                        {
                            alt1039=2;
                        }
                        else if (
((isIdent(LT(1)->getText(LT(1)),SPECPARAM_IDENT))) )
                        {
                            alt1039=3;
                        }
                        else if (
((isIdent(LT(1)->getText(LT(1)),ENUM_IDENT))) )
                        {
                            alt1039=5;
                        }
                        else if (
((isIdent(LT(1)->getText(LT(1)),GENVAR_IDENT))) )
                        {
                            alt1039=6;
                        }
                        else if (
((isIdent(LT(1)->getText(LT(1)),LET_IDENT))) )

The first 3 conditions look out of place. It appears even with predicates,
ANTLR will increase k if it thinks it can help resolve ambiguities. Chapter
13 in the book doesn't appear to describe cases like this. The first case
won't work as three different alternatives match this sequence. If I force
k=1 for this rule, then the code is generated as expected. Strangely,
removing the PERIOD from the GENBLOCK subrule also works, but breaks the
grammar. Is this expected behaviour?


More information about the antlr-interest mailing list