[antlr-interest] Semantic predicates that aren't & hoisting

David Jung jungdl at ornl.gov
Thu Mar 10 07:57:15 PST 2005


I have a large grammar for a complex custom programming language
that I've implemented in ANTLR.  However, there is one problem
I have been unable to solve.  I don't think there is a clean
way to solve it within the current ANTLR framework (2.7.4), but
I could be wrong - hence my query.

I have some parser rules something like this (to simplify):

expr: additiveExpr ;

additiveExpr: multExpr ( ('+' | '-') multExpr ;

multExpr: ...

...

primaryExpr:
   const
 | buildinType
 ...
 | expressionList
 ;

expressionList : '{' expr ( ';' expr )* '}' ;


The important rule for this discussion being expressionList.
So, a valid expr can be "5" or "{5; 7; 8}"
or
"{
   {
     2;
     3;
   };     <-- notice this ';'
   5;
 }
"

My problem is that I want to eliminate the requirement for
the separating ';' after a sub-expressionList.  I cannot just make
the ';' optional in the expressionList rule (i.e. (';')?) as
that makes expr ambiguous (i.e. can't distinguish between
expr "3" followed by expr "-2" and expr "3-2" (evaluating to 1) ).

I think it may be possibly to solve by having two paths through
all the rules between expr and primaryExpr, but that would
mean duplicating almost my whole grammar (of Java/C++ order of complexity);
and isn't very natural.

So, my first attempt at a solution went like this (using a semantic
predicate):

expressionList:
  : '{' expr ( conditionalSEMI expr )* '}' ;

conditionalSEMI : {<just-saw-'}'-token>}? | SEMI ;

While this works for expressionList, it generates many ANTLR
ambiguous warnings all over the grammar - which makes me uncomfortable.
This uses ANTLR's semantic predicate notation, but it is really
a syntactic predicate.  Is there any way to have the condition hoisted
into the rule decision logic so that the warnings are eliminated?
(note that an ANTLR syntactic predicate doesn't work because is
provides a lookahead grammar for prediction, while I need to look
back at the last token).

My second idea for a solution was something like:

expressionList:
  : '{' expr ( {conditionallyInsertSEMI();} SEMI expr )* '}' ;

defining the function:

void conditionallyInsertSEMI()
{
  if (<last token wasn't '}'>)
    Insert_SEMI_into_token_stream();
}

However, I don't know how (or if) it is possible to insert
a token into the stream during parsing like that.


Does anyone who is an experienced ANTLR user know if it is
possible to do what I want cleanly? Or if there is a way
to force hoisting or to insert a token?
Thanks in advance for help - it will be appreciated.
-David.



More information about the antlr-interest mailing list