[antlr-interest] forced or "gate" semantic predicates
Terence Parr
parrt at cs.usfca.edu
Tue Nov 29 15:09:06 PST 2005
So, I'm playing around with real grammars (python and ruby) to find
nasty parsing problems (boy did I find them). In python, there are
only lexical problems really. newline is context-sensitive and
somethings you want to ignore them and sometimes you don't. Smells
like we need a predicate, but problems arise. For example, here is
the comment rule:
COMMENT
@init {
int startPos = getCharPositionInLine();
}
: "#" (~"\n")* { channel=99; }
( {startPos==0}? "\n" )+
;
You want it to scarf all trailing comments as they are not statement
terminators; we don't want COMMENT to fall out, which would let
NEWLINE have them (annoying the parser).
The loop:
( {startPos==0}? "\n" )+
*looks* like it would not match the subrule according to the
predicate, but....it is not even part of the prediction! Remember
predicates exist to disambiguate syntactic problems. In this case,
the lexers thinks that as long as newlines are present, it will scarf
them. No semantics required.
This is the problem that Oliver ran into for XML. He needs a
predicate that "gates" in/out a production or loop. I know how to
insert that (analysis would ignore; code gen would simply insert
always), but what the hell would the syntax be to distinguish?
The parser seems to work naturally with disambiguating predicates but
the lexer seems to want gating predicates. Interesting. Loring
Craymer had an interesting suggestion at the workshop: use syn pred
notation around it. ({...}?)=>
Perhaps a variation:
( {startPos==0}=> "\n" )+
where the action implies when it's ok to proceed.
These would NOT hoist as you'd be evaluating them out of context
(which is why the {..}? variety are always evaluated *after* the
lookahead test). These gate predicates would simply get shoved into
the lookahead decision for that specific production.
The problem is that sometimes you want to distinguish between two
lexical rules like Oliver did:
TAG_OPEN : { !tagMode } => '<' { tagMode = true; } ;
TAG_CLOSE : { tagMode } => '>' { tagMode = false; } ;
Perhaps the special case of gate predicates on the left edge of
lexical rules get hoisted one level into the special Tokens rule that
chooses among lexer rules.
Heh, i like it! Adding to blog. Comments?
Ter
More information about the antlr-interest
mailing list