[antlr-interest] a proposed enhancement to lexers [hoisting?]

Sat Aug 31 11:06:44 PDT 2002

Folks,

I have gotten bitten by the "match only at the start of line" issue one 
last time!  I'm going to fix this.  I can see multiple solutions.  What 
you really want is for this to work:

STAR_AT_LEFT_EDGE
	:	{getColumn()==1}? '['
	;

STAR
	:	'*'
	;

It doesn't because the autogenerated rule nextToken that invokes these 
only considers syntax.  I.e., it builds

nextToken() {
	switch (LA(1)) {
	case '*' : STAR_AT_LEFT_EDGE(); break;
	case '*' : STAR(); break;
	}
}

naturally this is nondeterministic since '*' starts both rules.  One 
way to handle this is a rule option that specifies the semantic context 
under which the rule applies.  But then I realized that a simple form 
of HOISTING from the left edge of rules into nextToken()'s 
computations, would solve this.

All I have to do is look for any semantic predicates on the left edge 
of any alternative and hoist them into the prediction expression for 
the rules in nextToken().  It will make some things slower.  For 
example, most stuff gets predicted with a switch, but now '*' would be 
checked with

if ( getColumn()==1 && LA(1)=='*' ) STAR_AT_LEFT_EDGE();
else STAR_AT_LEFT_EDGE();

and if there are many of these, the linear walk could prove a bit more 
expensive, but we should need this much.

Anyway, it nicely solves the "only apply this rule under these semantic 
conditions" problem.  Remember the '^' rexpr char for "beginning of a 
line" is a bit weird as "beginning of a line" is a state of the lexer 
not a character.  This solution would be more powerful and a more 
natural expression of your needs.

Sound like the right thing to do?  It would probably require the C++ 
and C# generators to change also to be consistent.

Ter
--
Co-founder, http://www.jguru.com
Creator, ANTLR Parser Generator: http://www.antlr.org

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/