[antlr-interest] Nondeterminism in my grammar files

Sun Dec 10 13:43:11 PST 2006

Hello Everybody,

I'm new to Antlr and I appreciate the work in devising this tool.
I'm designing a compiler, a look alike of C. I tried my best to avoid
non-determinism in my grammar. I could not trace out a few.
At a point, I've also set the token look ahead to 4. The below is my
scanner/lexer grammar-
*

options* {

k = 4;

*testLiterals* = *false;   **}
*
*

protected* LETTER: ('a'..'z' | 'A'..'Z');
*

protected* DIGIT: '0'..'9';

ID *options* { testLiterals = *true*; }

: LETTER (LETTER | DIGIT | '_')*;

NUMBER: (DIGIT)+;

PLUS : '+';

SUB : '-';

MULT : '*';

DIV : '/';

ASSIGN : '=';

EQUALS : "==";

LPAREN : '(';

RPAREN : ')';

LSQRPAREN : '[';

RSQRPAREN : ']';

LBRACE : '{';

RBRACE : '}';

LT : "<";

GT : ">";

NOT : '!';

COMMA : ',';

AND : "&&";

OR : "||";

DOT : '.';

COLON : ':';

SEMI : ';';

PERCENTILE : '%';

COMMENT : ("//") {_ttype = Token.SKIP; };

KEYWORD_INT *options* { testLiterals = *false*; }: "int";

KEYWORD_IF *options* { testLiterals = *false*; }: "if";

KEYWORD_ELSE *options* { testLiterals = *false*; }: "else";

KEYWORD_WITH *options* { testLiterals = *false*; }: "with";

KEYWORD_BYTE *options* { testLiterals = *false*; }: "byte";

KEYWORD_VOID *options* { testLiterals = *false*; }: "void";

KEYWORD_LOOP *options* { testLiterals = *false*; }: "loop";

KEYWORD_RETURN *options* { testLiterals = *false*; }: "return";

KEYWORD_INTERFACE *options* { testLiterals = *false*; }: "interface";

KEYWORD_TRIGGER *options* { testLiterals = *false*; }: "trigger";

KEYWORD_ARRAY *options* { testLiterals = *false*; }: "array";

NEWLINE: ('\r''\n'|'\n') {newline(); _ttype = SCAN_END; } ;

WS: (' '|'\t')+ {_ttype = Token.SKIP;} ;

*#-----------------------------------------------------------------*

The warnings I got are-

lexical nondeteminism between rules ID and KEYWORD_INT upon k==1:'i'
k==2:'n' k==3:'t' k==4:<end-of-token>

the other warnings are similar to the above....

*#-----------------------------------------------------------------*

I have the same problem with my parser too. I'm trying to build an AST tree
and while testing couln't build a few of them. I think this has been caused
due to the nondeterminism problem in the scanner. May be my thinking is
immature. My parser file is given below-
*

tokens*{

TREE_ROOT;

TREE_VAR;

VAR_DEC;

FUNC_DEC;

FORMALS;

BLOCK;

FUNC_CALL;

IF_ELSE_STMT;

LOOP_STMT;

INTERFACE_DEC;

ARRAY_DEC;

ARRAY_INDEX;

TRIGGER_DEC;

}

{

*class* ChirpErrorException *extends* RuntimeException {

ChirpErrorException(String msg) {

*super*(msg);

}

}

*private* *void* setLine(AST ast, *int* line) {

ChirpAST chirp_ast = (ChirpAST) ast;

chirp_ast.setLine(line);

}

}

file: ( var_dec (SCAN_END!)* | func_dec (SCAN_END!)*

| interface_dec (SCAN_END!)* | trigger_dec(SCAN_END!)* | array_dec
(SCAN_END!)*)+

{ #file=#([TREE_ROOT, "tree_root"], file);};

var_dec: ( data_type (assignment) (SEMI!) )

{ #var_dec=#([VAR_DEC, "var_dec"], var_dec);};

assignment: ID (ASSIGN^ NUMBER)? ;

data_type: KEYWORD_INT | KEYWORD_BYTE ;

array_dec: (data_type ID (LSQRPAREN! NUMBER RSQRPAREN!) (SEMI!))

{ #array_dec=#([ARRAY_DEC, "array_dec"], array_dec);};

array_index: (LSQRPAREN! ID RSQRPAREN!){ #array_index=#([ARRAY_INDEX,
"array_index"], array_index);};

func_dec: (func_type ID (LPAREN! ( formals ( COMMA! formals)* )? RPAREN!)
(SCAN_END!)* block)

{ #func_dec = #([FUNC_DEC, "func_dec"],func_dec);};

func_type: data_type | KEYWORD_VOID ;

formals: (data_type ID) { #formals=#([FORMALS, "formals"], formals);};

block: ( LBRACE! (SCAN_END!)* ( statement )+ (SCAN_END!)* RBRACE! )

{ #block=#([BLOCK, "block"], block);};

statement: assign_stmt (SEMI!) (SCAN_END!)* | block (SCAN_END!)* |
if_else_stmt (SCAN_END!)*

| func_call (SEMI!)(SCAN_END!)* | loop_stmt (SCAN_END!)* | return_stmt
(SEMI!)(SCAN_END!)* ;

assign_stmt: ID ( array_index )? ASSIGN^ expr;

expr: NOT rel_expr | rel_expr (( AND^ | OR^ ) rel_expr )* ;

rel_expr: arith_expr (( EQUALS^ | LT^ | GT^ ) arith_expr )* ;

arith_expr: term (( PLUS^ | SUB^ ) term )* ;

term: factor (( MULT^ | DIV^ | PERCENTILE^ ) factor )* ;

factor: ID ( LSQRPAREN! expr RSQRPAREN! )? | NUMBER | LPAREN! expr RPAREN! |
func_call ;

func_call: ( ID (DOT ID)* (LPAREN! ( expr ( COMMA! expr )* )? RPAREN!))

{ #func_call = #([FUNC_CALL, "func_call"],func_call);};

if_else_stmt: ( KEYWORD_IF (LPAREN! expr RPAREN!) (SCAN_END!)* statement

(*options* {greedy = *true*;}: KEYWORD_ELSE statement)? )

{ #if_else_stmt = #([IF_ELSE_STMT, "if_else_stmt"],if_else_stmt);};

loop_stmt: (KEYWORD_LOOP ( LPAREN! expr RPAREN!)? block (KEYWORD_WITH ID
(SEMI!))? )

{ #loop_stmt = #([LOOP_STMT, "loop_stmt"],loop_stmt);};

return_stmt: KEYWORD_RETURN ( expr )? ;

interface_dec: (KEYWORD_INTERFACE ID (DOT ID)*

(LBRACE! ((SCAN_END!)* (func_dec)+ (SCAN_END!)*)+ RBRACE!))

{ #interface_dec = #([INTERFACE_DEC, "interface_dec"], interface_dec);};

trigger_dec: (KEYWORD_TRIGGER ID (LBRACE! (SCAN_END!)*

LPAREN! expr RPAREN! (SCAN_END!)* RBRACE!) COLON! block)

{ #trigger_dec = #([TRIGGER_DEC, "trigger_dec"], trigger_dec);};

*
#--------------------------------------------------------------------------------------------
*

The warnings I got in the parser are from the "statement"(colored in redabove)-

nondeterminism upon k==1:SCAN_END k==2:SCAN_END, ID, LBRACE,
RBRACE,KEYWORD_IF,KEYWORD_ELSE,KEYWORD_LOOP, KEYWORD_RETURN
k==3:EOF,ID,SCAN_END......KEYWORD_INTERFACE,KEYWORD_TRIGGER between alt 1
and exit branch of block.

Other 5 warnings are from the same Non-terminal.

#------------------------------------------------------------------------------------------

Please help me trace the problem.

Thanks in advance,

Vinay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20061210/73570443/attachment-0001.html