[antlr-interest] Simple nondeterminism help

Sun Oct 26 15:53:50 PST 2003

Thanks, it was very useful.

----- Original Message ----- 
From: <jbb at acm.org>
To: <antlr-interest at yahoogroups.com>
Sent: Tuesday, October 21, 2003 1:20 AM
Subject: Re: [antlr-interest] Simple nondeterminism help

> Hi Nico!
>
> This is probably way too long a reply, but here goes...
>
> You asked (in summary):
> >.....but I still have
> >nondeterminism in number RULE.
> >
> >I am converting a grammar from an BNF And I always have this
nondeterminism
> >errors.
> >
> >Does anybody know some guidelines to avoid this?
>
> ....grammar sniped....
>
> I think that the general concepts of "Left Factoring" and probably
> "Left Recursion Removal" form the sort of guidelines to help avoid
> non-determinism problems in LL grammars.
>
> Any decent compiler book should discuss these concepts.
>
> The rest of this message discusses how I fixed your non-determinism
> problem...
>
> First here is your original grammar with repairs in order to make the
> lexer supply all of the tokens that the parser needs. I hope you will
> agree that this grammar is the same as yours...
>
> -----Begin S2.g-----
> class S2Parser extends Parser;
>
> // nondeterminism here !!
> number :
> (digits LOWER_R)?    // a
> (MINUS)?    // b
> bigDigits    // c
> (DOT bigDigits)?    // d
> (LOWER_E (MINUS)? digits)? // e
>     ;
>
> bigDigits: (bigDigit)+ ;
>
> bigDigit: DIGIT | CAPITALLETTER ;
>
> letter: lowerletter | CAPITALLETTER ;
>
> lowerletter : LOWER_A_D | LOWER_E | LOWER_F_Q | LOWER_R | LOWER_S_Z ;
>
> digits: (DIGIT)+ ;
>
> class S2Lexer extends Lexer;
>
> DOT : '.' ;
> MINUS : '-' ;
>
> DIGIT : '0'..'9' ;
> CAPITALLETTER: 'A'..'Z' ;
>
> LOWER_A_D : 'a'..'d' ;
> LOWER_E   : 'e'      ;
> LOWER_F_Q : 'f'..'q' ;
> LOWER_R   : 'r'      ;
> LOWER_S_Z : 's'..'z' ;
> -----End S2.g-----
>
> and here are some strings that are (apparently) valid numbers under
> your grammar - will use these for testing.
>
> -----Begin test.txt-----
> 2
> -2
> 2.0
> A
> -A
> A.B
> 2r1
> 2r-1
> 2r-1.3
> 2r-1.3e4
> 2r-1.3e-44
> 2rA
> 2r-A
> 2r-A.Be9
> 2r-A.Be-9
> -----End test.txt-----
>
> Our non-determinism problem is that at the beginning of processing of
> the number rule. If the lexer returns a DIGIT token, the parser can
> not know whether that DIGIT is part of the optional 'r' phrase (which
> I marked with the comment // a) or is that DIGIT part of the required
> bigDigits phrase (which I marked with a // c).
>
> And so we must left factor the first 3 phrases of the number
> (e.g. what I marked as a, b, and c).
>
> And here is a grammar that has that left factoring:
>
> -----Begin S2_1.g-----
> class S2Parser extends Parser;
>
> entry : (number EOL)+ EOF;
>
> number : number_front number_tail ;
>
> number_front :
>       (DIGIT)+
> ( LOWER_R (MINUS)? (bigDigit)+
> | CAPITALLETTER (bigDigit)* )?
>     | MINUS (bigDigit)+
>     | CAPITALLETTER (bigDigit)*
>     ;
>
> number_tail : (DOT (bigDigit)+)? (LOWER_E (MINUS)? (DIGIT)+)? ;
>
> bigDigit: DIGIT | CAPITALLETTER ;
>
> letter: lowerletter | CAPITALLETTER ;
>
> lowerletter : LOWER_A_D | LOWER_E | LOWER_F_Q | LOWER_R | LOWER_S_Z ;
>
> class S2Lexer extends Lexer;
>
> options {
>     k = 2;
> }
>
> DOT : '.' ;
> MINUS : '-' ;
>
> DIGIT : '0'..'9' ;
> CAPITALLETTER: 'A'..'Z' ;
>
> LOWER_A_D : 'a'..'d' ;
> LOWER_E   : 'e'      ;
> LOWER_F_Q : 'f'..'q' ;
> LOWER_R   : 'r'      ;
> LOWER_S_Z : 's'..'z' ;
>
> WS  : ( ' '|'\t') { $setType(Token.SKIP); } ;
> EOL : "\r\n"|'\r'|'\n' { newline(); } ;
> -----End S2_1.g-----
>
> Run this grammar with this Main using the test.txt as input and see no
> errors...
>
> -----Begin Main.java-----
> import java.io.*;
>
> class Main {
>     public static void main(String[] args) {
> try {
>     S2Lexer lexer =
> new S2Lexer(new DataInputStream(System.in));
>     S2Parser parser = new S2Parser(lexer);
>     parser.entry();
> } catch(Exception e) {
>     System.err.println("exception: "+e);
> }
>     }
> }
> -----End Main.java-----
>
> And lastly, I believe that the number rule (and all its attendant
> sub-rules) should really be placed in the lexer and not in the
> parser. But haven't really thought about how to do that.
>
> Hope this helps....
>
> -jbb
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/