[antlr-interest] a proposed enhancement to lexers [hoisting?]
Terence Parr
parrt at jguru.com
Sat Aug 31 11:06:44 PDT 2002
Folks,
I have gotten bitten by the "match only at the start of line" issue one
last time! I'm going to fix this. I can see multiple solutions. What
you really want is for this to work:
STAR_AT_LEFT_EDGE
: {getColumn()==1}? '['
;
STAR
: '*'
;
It doesn't because the autogenerated rule nextToken that invokes these
only considers syntax. I.e., it builds
nextToken() {
switch (LA(1)) {
case '*' : STAR_AT_LEFT_EDGE(); break;
case '*' : STAR(); break;
}
}
naturally this is nondeterministic since '*' starts both rules. One
way to handle this is a rule option that specifies the semantic context
under which the rule applies. But then I realized that a simple form
of HOISTING from the left edge of rules into nextToken()'s
computations, would solve this.
All I have to do is look for any semantic predicates on the left edge
of any alternative and hoist them into the prediction expression for
the rules in nextToken(). It will make some things slower. For
example, most stuff gets predicted with a switch, but now '*' would be
checked with
if ( getColumn()==1 && LA(1)=='*' ) STAR_AT_LEFT_EDGE();
else STAR_AT_LEFT_EDGE();
and if there are many of these, the linear walk could prove a bit more
expensive, but we should need this much.
Anyway, it nicely solves the "only apply this rule under these semantic
conditions" problem. Remember the '^' rexpr char for "beginning of a
line" is a bit weird as "beginning of a line" is a state of the lexer
not a character. This solution would be more powerful and a more
natural expression of your needs.
Sound like the right thing to do? It would probably require the C++
and C# generators to change also to be consistent.
Ter
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list