[antlr-interest] Serious Bug when using BitSetgeneration

Olivier Dragon dragonoe at mcmaster.ca
Wed Nov 9 11:34:15 PST 2005


On Wed, Nov 09, 2005 at 09:15:56AM -0800, Terence Parr wrote:
> First, the => pred is totally unnecessary and antlr removes it:
> 
> 	switch(LA(1)) {
> 	...
> 		case '.':
> 		{
> 			match('.');
> 
> The code it generates seems correct.  Can you tell me what path  
> through the code seems bad?

The problem appears to me in the above code that the "match('.')" above
should be protected by a syntactic predicate as shown in the grammar.
I've created a simpler test which might help identify this.  To get the
correct behaviour I have to increase the codeGenMakeSwitchThreshold
above the number of alternate paths. Here is the grammar used with the
default codeGen value.


class SimpleSynPredTest extends Lexer;
options {
    exportVocab=SimpleSynPredTest; // call the vocabulary "Fortran77"
    testLiterals=false;    // don't automatically test for literals
    k=1;                   // character lookahead
}

OP: ".gt.";

NUMERAL:
	('0'..'9')+ // integer
	(
		'h' | // hex
		('.' ('0'..'9'| ~('g') )) =>
		'.' ('0'..'9')* // real
	)?
	;


will generate code that causes a lexical error on the simple string
"100.gt.1000" (this is Fortran syntax), because instead of tokenizing to
"100" (integer NUMERAL), ".gt."(OP) and "1000" (integer NUMERAL), it
creates "100.", which is a valid syntax for reals in Fortran, and then
breaks on the "g":

line 1:5: unexpected char: 'g'
	at SimpleSynPredTest.nextToken(SimpleSynPredTest.java:81)
	at Main.main(Main.java:23)

This is the offending code (switch case '.'):


		switch ( LA(1)) {
		case 'h':
		{
			match('h');
			break;
		}
		case '.':
		{
			match('.');
			{
			_loop23:
			do {
				if (((LA(1) >= '0' && LA(1) <= '9'))) {
					matchRange('0','9');
				}
				else {
					break _loop23;
				}
				
			} while (true);
			}
			break;
		}
		default:
			{
			}
		}


This happens with the default value of codeGenMakeSwitchThreshold which
I presume is 1. When I increase its value to greater than the number of
alternatives (2 in this case, so 3), I get the following code which
tokenizes correctly without throwing exceptions. I have been running
code like this with a higher codeGenMakeSwitchThreshold value on a lot
of Fortran code and so far it hasn't failed me.

		if ((LA(1)=='h')) {
			match('h');
		}
		else {
			boolean synPredMatched55 = false;
			if (((LA(1)=='.'))) {
				int _m55 = mark();
				synPredMatched55 = true;
				inputState.guessing++;
				try {
					{
					match('.');
					{
					if (((LA(1) >= '0' && LA(1) <= '9'))) {
						matchRange('0','9');
					}
					else if ((_tokenSet_0.member(LA(1)))) {
						{
						match(_tokenSet_0);
						}
					}
					else {
						throw new NoViableAltForCharException((char)LA(1), getFilename(), getLine(), getColumn());
					}
					
					}
					}
				}
				catch (RecognitionException pe) {
					synPredMatched55 = false;
				}
				rewind(_m55);
inputState.guessing--;
			}
			if ( synPredMatched55 ) {
				match('.');
				{
				_loop57:
				do {
					if (((LA(1) >= '0' && LA(1) <= '9'))) {
						matchRange('0','9');
					}
					else {
						break _loop57;
					}
					
				} while (true);
				}
			}
			else {
			}
			}


Does this make things clearer? Let me know if I can help more.

-Olivier


-- 
          __-/|    ? ?     |\-__
     __--/  /  \   (^^)   /  \  \--__
  _-/   /   /  /\ / ( )  /\  \   \   \-_
 /  /   /  /  /  (   ^^ ~  \  \  \   \  \
 / Oli Dragon    ( dragonoe at mcmaster.ca \
/  B.Eng. Sfwr   (     )    \    \  \    \
/  /  /    /__--_ (   ) __--__\    \  \  \
|  /  /  _/        \_ \_       \_  \  \  |
 \/  / _/            \_ \_       \_ \  \/
  \_/ /                -\_\        \ \_/
    \/                    )         \/
                        *~
        ___--<***************>--___
       [http://dragon.homelinux.org]
        ~~~--<***************>--~~~
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20051109/aadd7716/attachment.bin


More information about the antlr-interest mailing list