[antlr-interest] Simple nondeterminism help

Mon Oct 20 21:20:18 PDT 2003

Hi Nico!

This is probably way too long a reply, but here goes...

You asked (in summary):
>.....but I still have
>nondeterminism in number RULE.
>
>I am converting a grammar from an BNF And I always have this nondeterminism
>errors.
>
>Does anybody know some guidelines to avoid this?

....grammar sniped....

I think that the general concepts of "Left Factoring" and probably
"Left Recursion Removal" form the sort of guidelines to help avoid
non-determinism problems in LL grammars.

Any decent compiler book should discuss these concepts.

The rest of this message discusses how I fixed your non-determinism
problem...

First here is your original grammar with repairs in order to make the
lexer supply all of the tokens that the parser needs. I hope you will
agree that this grammar is the same as yours...

-----Begin S2.g-----
class S2Parser extends Parser;

// nondeterminism here !!
number :
	(digits LOWER_R)?	   // a
	(MINUS)?		   // b
	bigDigits		   // c
	(DOT bigDigits)?	   // d
	(LOWER_E (MINUS)? digits)? // e
    ;

bigDigits: (bigDigit)+ ;

bigDigit: DIGIT | CAPITALLETTER ;

letter: lowerletter | CAPITALLETTER ;

lowerletter : LOWER_A_D | LOWER_E | LOWER_F_Q | LOWER_R | LOWER_S_Z ;

digits: (DIGIT)+ ;

class S2Lexer extends Lexer;

DOT : '.' ;
MINUS : '-' ;

DIGIT : '0'..'9' ;
CAPITALLETTER: 'A'..'Z' ;

LOWER_A_D : 'a'..'d' ;
LOWER_E   : 'e'      ;
LOWER_F_Q : 'f'..'q' ;
LOWER_R   : 'r'      ;
LOWER_S_Z : 's'..'z' ;
-----End S2.g-----

and here are some strings that are (apparently) valid numbers under
your grammar - will use these for testing.

-----Begin test.txt-----
2
-2
2.0
A
-A
A.B
2r1
2r-1
2r-1.3
2r-1.3e4
2r-1.3e-44
2rA
2r-A
2r-A.Be9
2r-A.Be-9
-----End test.txt-----

Our non-determinism problem is that at the beginning of processing of
the number rule. If the lexer returns a DIGIT token, the parser can
not know whether that DIGIT is part of the optional 'r' phrase (which
I marked with the comment // a) or is that DIGIT part of the required
bigDigits phrase (which I marked with a // c).

And so we must left factor the first 3 phrases of the number
(e.g. what I marked as a, b, and c).

And here is a grammar that has that left factoring:

-----Begin S2_1.g-----
class S2Parser extends Parser;

entry : (number EOL)+ EOF;

number : number_front number_tail ;

number_front :
      (DIGIT)+
	( LOWER_R (MINUS)? (bigDigit)+
	| CAPITALLETTER (bigDigit)* )?
    | MINUS (bigDigit)+
    | CAPITALLETTER (bigDigit)*
    ;

number_tail : (DOT (bigDigit)+)? (LOWER_E (MINUS)? (DIGIT)+)? ;

bigDigit: DIGIT | CAPITALLETTER ;

letter: lowerletter | CAPITALLETTER ;

lowerletter : LOWER_A_D | LOWER_E | LOWER_F_Q | LOWER_R | LOWER_S_Z ;

class S2Lexer extends Lexer;

options {
    k = 2;
}

DOT : '.' ;
MINUS : '-' ;

DIGIT : '0'..'9' ;
CAPITALLETTER: 'A'..'Z' ;

LOWER_A_D : 'a'..'d' ;
LOWER_E   : 'e'      ;
LOWER_F_Q : 'f'..'q' ;
LOWER_R   : 'r'      ;
LOWER_S_Z : 's'..'z' ;

WS  : ( ' '|'\t') { $setType(Token.SKIP); } ;
EOL : "\r\n"|'\r'|'\n' { newline(); } ;
-----End S2_1.g-----

Run this grammar with this Main using the test.txt as input and see no
errors...

-----Begin Main.java-----
import java.io.*;

class Main {
    public static void main(String[] args) {
	try {
	    S2Lexer lexer =
		new S2Lexer(new DataInputStream(System.in));
	    S2Parser parser = new S2Parser(lexer);
	    parser.entry();
	} catch(Exception e) {
	    System.err.println("exception: "+e);
	}
    }
}
-----End Main.java-----

And lastly, I believe that the number rule (and all its attendant
sub-rules) should really be placed in the lexer and not in the
parser. But haven't really thought about how to do that.

Hope this helps....

	-jbb

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/