[antlr-interest] Context-Sensitive Follow Sets.. Bug?

Sun May 23 10:06:16 PDT 2010

Here is the same message without formatting:

---------- Forwarded message ----------
From: Sameh W. Zaky <sameh.wz at gmail.com>
Date: Sun, May 23, 2010 at 3:29 PM
Subject: [antlr-interest] Context-Sensitive Follow Sets.. Bug?
To: antlr-interest at antlr.org

Hello everyone..
After reading the wiki article
http://www.antlr.org/wiki/display/ANTLR3/Custom+Syntax+Error+Recovery,
(thanks to Jim Idle, "Yes, you def. deserve a Masters too" ;-)) I went on
with developing my own example to test the
method: computeContextSensitiveRuleFOLLOW() (described here:
http://www.antlr.org/api/Java/classorg_1_1antlr_1_1runtime_1_1_base_recognizer.html#2b566e00e5d771f66dd4e29a4a27a1c4
)
The method works perfectly in all cases except in the case of zero or one,
Consider the following simple grammar:

=============================================================================================

start : animal (AND acClass)? service EOF;

animal : (DOG | CAT );
service : (HARDWARE | SOFTWARE) ;
AND : 'and';
DOG : 'dog';
CAT : 'cat';
HARDWARE: 'hardware';
SOFTWARE: 'software';

acClass
@init
{ System.out.println(computeContextSensitiveRuleFOLLOW().toString());}
    :     ;
=====================================================================================

Testing this grammar, with let's say input:
"dog and software",
the result in the console is:
"{4, 7, 8}" (which stands for tokens {THE, HARDWARE, SOFTWARE}),
although it is supposed to be:
"{7,8}" (which stand for tokens {HARDWARE, SOFTWARE} only).. Because after
acClass in start rule, if we get "the" as next token this will make the
input invalid..

Any idea why this happens? Or how we can overcome it?
Thanks a bunch..
--
Sameh W. Zaky

Regards
--
Sameh W. Zaky