[antlr-interest] Context-Sensitive Follow Sets.. Bug?

Sun May 23 06:29:54 PDT 2010

Hello everyone..

After reading the wiki article
http://www.antlr.org/wiki/display/ANTLR3/Custom+Syntax+Error+Recovery,
(thanks to Jim Idle, "Yes, you def. deserve a Masters too" ;-)) I went on
with developing my own example to test the method*:
computeContextSensitiveRuleFOLLOW()
*(described here:
http://www.antlr.org/api/Java/classorg_1_1antlr_1_1runtime_1_1_base_recognizer.html#2b566e00e5d771f66dd4e29a4a27a1c4
)

The method works perfectly in all cases except in the case of *zero or one*,
Consider the following simple grammar:

*start** **:** **animal (AND acClass)? service EOF;*
*
*
*
*
*animal** **:** **(DOG | CAT );*
* *
*service** **:** **(HARDWARE | SOFTWARE) ;*
*
*
*AND** **:** **'and';*
*
*
*DOG** **:** **'dog';*
*
*
*CAT** **:** **'cat';*
*
*
*HARDWARE:** **'hardware';*
*
*
*SOFTWARE:** **'software';*
*
*
*acClass*
*@init*
*{ System.out.println(computeContextSensitiveRuleFOLLOW().toString());}*
*    :     ;*

Testing this grammar, with let's say input:
"*dog and software*",
the result in the console is:
"{4, 7, 8}" (which stands for tokens {THE, HARDWARE, SOFTWARE}),
although it is supposed to be:
"{7,8}" (which stand for tokens {HARDWARE, SOFTWARE} only).. Because after *
acClass* in *start* rule, if we get "the" as next token this will make the
input invalid..

Any idea why this happens? Or how we can overcome it?

Thanks a bunch..
-- 
Sameh W. Zaky