[antlr-interest] forced or "gate" semantic predicates

Terence Parr parrt at cs.usfca.edu
Tue Nov 29 15:09:06 PST 2005


So, I'm playing around with real grammars (python and ruby) to find  
nasty parsing problems (boy did I find them).  In python, there are  
only lexical problems really.  newline is context-sensitive and  
somethings you want to ignore them and sometimes you don't.  Smells  
like we need a predicate, but problems arise.  For example, here is  
the comment rule:

COMMENT
@init {
     int startPos = getCharPositionInLine();
}
     :   "#" (~"\n")* { channel=99; }
         ( {startPos==0}? "\n" )+
     ;

You want it to scarf all trailing comments as they are not statement  
terminators; we don't want COMMENT to fall out, which would let  
NEWLINE have them (annoying the parser).

The loop:

( {startPos==0}? "\n" )+

*looks* like it would not match the subrule according to the  
predicate, but....it is not even part of the prediction!  Remember  
predicates exist to disambiguate syntactic problems.  In this case,  
the lexers thinks that as long as newlines are present, it will scarf  
them.  No semantics required.

This is the problem that Oliver ran into for XML.  He needs a  
predicate that "gates" in/out a production or loop.  I know how to  
insert that (analysis would ignore; code gen would simply insert  
always), but what the hell would the syntax be to distinguish?

The parser seems to work naturally with disambiguating predicates but  
the lexer seems to want gating predicates.  Interesting.  Loring  
Craymer had an interesting suggestion at the workshop: use syn pred  
notation around it.  ({...}?)=>

Perhaps a variation:

( {startPos==0}=> "\n" )+

where the action implies when it's ok to proceed.

These would NOT hoist as you'd be evaluating them out of context  
(which is why the {..}? variety are always evaluated *after* the  
lookahead test).  These gate predicates would simply get shoved into  
the lookahead decision for that specific production.

The problem is that sometimes you want to distinguish between two  
lexical rules like Oliver did:

TAG_OPEN : { !tagMode } => '<' { tagMode = true; } ;
TAG_CLOSE : { tagMode } => '>' { tagMode = false; } ;

Perhaps the special case of gate predicates on the left edge of  
lexical rules get hoisted one level into the special Tokens rule that  
chooses among lexer rules.

Heh, i like it!  Adding to blog.  Comments?

Ter


More information about the antlr-interest mailing list