[antlr-interest] Serious Bug when using BitSetgeneration
Olivier Dragon
dragonoe at mcmaster.ca
Wed Nov 9 11:34:15 PST 2005
On Wed, Nov 09, 2005 at 09:15:56AM -0800, Terence Parr wrote:
> First, the => pred is totally unnecessary and antlr removes it:
>
> switch(LA(1)) {
> ...
> case '.':
> {
> match('.');
>
> The code it generates seems correct. Can you tell me what path
> through the code seems bad?
The problem appears to me in the above code that the "match('.')" above
should be protected by a syntactic predicate as shown in the grammar.
I've created a simpler test which might help identify this. To get the
correct behaviour I have to increase the codeGenMakeSwitchThreshold
above the number of alternate paths. Here is the grammar used with the
default codeGen value.
class SimpleSynPredTest extends Lexer;
options {
exportVocab=SimpleSynPredTest; // call the vocabulary "Fortran77"
testLiterals=false; // don't automatically test for literals
k=1; // character lookahead
}
OP: ".gt.";
NUMERAL:
('0'..'9')+ // integer
(
'h' | // hex
('.' ('0'..'9'| ~('g') )) =>
'.' ('0'..'9')* // real
)?
;
will generate code that causes a lexical error on the simple string
"100.gt.1000" (this is Fortran syntax), because instead of tokenizing to
"100" (integer NUMERAL), ".gt."(OP) and "1000" (integer NUMERAL), it
creates "100.", which is a valid syntax for reals in Fortran, and then
breaks on the "g":
line 1:5: unexpected char: 'g'
at SimpleSynPredTest.nextToken(SimpleSynPredTest.java:81)
at Main.main(Main.java:23)
This is the offending code (switch case '.'):
switch ( LA(1)) {
case 'h':
{
match('h');
break;
}
case '.':
{
match('.');
{
_loop23:
do {
if (((LA(1) >= '0' && LA(1) <= '9'))) {
matchRange('0','9');
}
else {
break _loop23;
}
} while (true);
}
break;
}
default:
{
}
}
This happens with the default value of codeGenMakeSwitchThreshold which
I presume is 1. When I increase its value to greater than the number of
alternatives (2 in this case, so 3), I get the following code which
tokenizes correctly without throwing exceptions. I have been running
code like this with a higher codeGenMakeSwitchThreshold value on a lot
of Fortran code and so far it hasn't failed me.
if ((LA(1)=='h')) {
match('h');
}
else {
boolean synPredMatched55 = false;
if (((LA(1)=='.'))) {
int _m55 = mark();
synPredMatched55 = true;
inputState.guessing++;
try {
{
match('.');
{
if (((LA(1) >= '0' && LA(1) <= '9'))) {
matchRange('0','9');
}
else if ((_tokenSet_0.member(LA(1)))) {
{
match(_tokenSet_0);
}
}
else {
throw new NoViableAltForCharException((char)LA(1), getFilename(), getLine(), getColumn());
}
}
}
}
catch (RecognitionException pe) {
synPredMatched55 = false;
}
rewind(_m55);
inputState.guessing--;
}
if ( synPredMatched55 ) {
match('.');
{
_loop57:
do {
if (((LA(1) >= '0' && LA(1) <= '9'))) {
matchRange('0','9');
}
else {
break _loop57;
}
} while (true);
}
}
else {
}
}
Does this make things clearer? Let me know if I can help more.
-Olivier
--
__-/| ? ? |\-__
__--/ / \ (^^) / \ \--__
_-/ / / /\ / ( ) /\ \ \ \-_
/ / / / / ( ^^ ~ \ \ \ \ \
/ Oli Dragon ( dragonoe at mcmaster.ca \
/ B.Eng. Sfwr ( ) \ \ \ \
/ / / /__--_ ( ) __--__\ \ \ \
| / / _/ \_ \_ \_ \ \ |
\/ / _/ \_ \_ \_ \ \/
\_/ / -\_\ \ \_/
\/ ) \/
*~
___--<***************>--___
[http://dragon.homelinux.org]
~~~--<***************>--~~~
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20051109/aadd7716/attachment.bin
More information about the antlr-interest
mailing list