[antlr-interest] Simple nondeterminism help
jbb at acm.org
jbb at acm.org
Mon Oct 20 21:20:18 PDT 2003
Hi Nico!
This is probably way too long a reply, but here goes...
You asked (in summary):
>.....but I still have
>nondeterminism in number RULE.
>
>I am converting a grammar from an BNF And I always have this nondeterminism
>errors.
>
>Does anybody know some guidelines to avoid this?
....grammar sniped....
I think that the general concepts of "Left Factoring" and probably
"Left Recursion Removal" form the sort of guidelines to help avoid
non-determinism problems in LL grammars.
Any decent compiler book should discuss these concepts.
The rest of this message discusses how I fixed your non-determinism
problem...
First here is your original grammar with repairs in order to make the
lexer supply all of the tokens that the parser needs. I hope you will
agree that this grammar is the same as yours...
-----Begin S2.g-----
class S2Parser extends Parser;
// nondeterminism here !!
number :
(digits LOWER_R)? // a
(MINUS)? // b
bigDigits // c
(DOT bigDigits)? // d
(LOWER_E (MINUS)? digits)? // e
;
bigDigits: (bigDigit)+ ;
bigDigit: DIGIT | CAPITALLETTER ;
letter: lowerletter | CAPITALLETTER ;
lowerletter : LOWER_A_D | LOWER_E | LOWER_F_Q | LOWER_R | LOWER_S_Z ;
digits: (DIGIT)+ ;
class S2Lexer extends Lexer;
DOT : '.' ;
MINUS : '-' ;
DIGIT : '0'..'9' ;
CAPITALLETTER: 'A'..'Z' ;
LOWER_A_D : 'a'..'d' ;
LOWER_E : 'e' ;
LOWER_F_Q : 'f'..'q' ;
LOWER_R : 'r' ;
LOWER_S_Z : 's'..'z' ;
-----End S2.g-----
and here are some strings that are (apparently) valid numbers under
your grammar - will use these for testing.
-----Begin test.txt-----
2
-2
2.0
A
-A
A.B
2r1
2r-1
2r-1.3
2r-1.3e4
2r-1.3e-44
2rA
2r-A
2r-A.Be9
2r-A.Be-9
-----End test.txt-----
Our non-determinism problem is that at the beginning of processing of
the number rule. If the lexer returns a DIGIT token, the parser can
not know whether that DIGIT is part of the optional 'r' phrase (which
I marked with the comment // a) or is that DIGIT part of the required
bigDigits phrase (which I marked with a // c).
And so we must left factor the first 3 phrases of the number
(e.g. what I marked as a, b, and c).
And here is a grammar that has that left factoring:
-----Begin S2_1.g-----
class S2Parser extends Parser;
entry : (number EOL)+ EOF;
number : number_front number_tail ;
number_front :
(DIGIT)+
( LOWER_R (MINUS)? (bigDigit)+
| CAPITALLETTER (bigDigit)* )?
| MINUS (bigDigit)+
| CAPITALLETTER (bigDigit)*
;
number_tail : (DOT (bigDigit)+)? (LOWER_E (MINUS)? (DIGIT)+)? ;
bigDigit: DIGIT | CAPITALLETTER ;
letter: lowerletter | CAPITALLETTER ;
lowerletter : LOWER_A_D | LOWER_E | LOWER_F_Q | LOWER_R | LOWER_S_Z ;
class S2Lexer extends Lexer;
options {
k = 2;
}
DOT : '.' ;
MINUS : '-' ;
DIGIT : '0'..'9' ;
CAPITALLETTER: 'A'..'Z' ;
LOWER_A_D : 'a'..'d' ;
LOWER_E : 'e' ;
LOWER_F_Q : 'f'..'q' ;
LOWER_R : 'r' ;
LOWER_S_Z : 's'..'z' ;
WS : ( ' '|'\t') { $setType(Token.SKIP); } ;
EOL : "\r\n"|'\r'|'\n' { newline(); } ;
-----End S2_1.g-----
Run this grammar with this Main using the test.txt as input and see no
errors...
-----Begin Main.java-----
import java.io.*;
class Main {
public static void main(String[] args) {
try {
S2Lexer lexer =
new S2Lexer(new DataInputStream(System.in));
S2Parser parser = new S2Parser(lexer);
parser.entry();
} catch(Exception e) {
System.err.println("exception: "+e);
}
}
}
-----End Main.java-----
And lastly, I believe that the number rule (and all its attendant
sub-rules) should really be placed in the lexer and not in the
parser. But haven't really thought about how to do that.
Hope this helps....
-jbb
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list