[antlr-interest] Nondeterminism in my grammar files
Vinay Veeramachaneni
virtuoso.vin at gmail.com
Sun Dec 10 13:43:11 PST 2006
Hello Everybody,
I'm new to Antlr and I appreciate the work in devising this tool.
I'm designing a compiler, a look alike of C. I tried my best to avoid
non-determinism in my grammar. I could not trace out a few.
At a point, I've also set the token look ahead to 4. The below is my
scanner/lexer grammar-
*
options* {
k = 4;
*testLiterals* = *false; **}
*
*
protected* LETTER: ('a'..'z' | 'A'..'Z');
*
protected* DIGIT: '0'..'9';
ID *options* { testLiterals = *true*; }
: LETTER (LETTER | DIGIT | '_')*;
NUMBER: (DIGIT)+;
PLUS : '+';
SUB : '-';
MULT : '*';
DIV : '/';
ASSIGN : '=';
EQUALS : "==";
LPAREN : '(';
RPAREN : ')';
LSQRPAREN : '[';
RSQRPAREN : ']';
LBRACE : '{';
RBRACE : '}';
LT : "<";
GT : ">";
NOT : '!';
COMMA : ',';
AND : "&&";
OR : "||";
DOT : '.';
COLON : ':';
SEMI : ';';
PERCENTILE : '%';
COMMENT : ("//") {_ttype = Token.SKIP; };
KEYWORD_INT *options* { testLiterals = *false*; }: "int";
KEYWORD_IF *options* { testLiterals = *false*; }: "if";
KEYWORD_ELSE *options* { testLiterals = *false*; }: "else";
KEYWORD_WITH *options* { testLiterals = *false*; }: "with";
KEYWORD_BYTE *options* { testLiterals = *false*; }: "byte";
KEYWORD_VOID *options* { testLiterals = *false*; }: "void";
KEYWORD_LOOP *options* { testLiterals = *false*; }: "loop";
KEYWORD_RETURN *options* { testLiterals = *false*; }: "return";
KEYWORD_INTERFACE *options* { testLiterals = *false*; }: "interface";
KEYWORD_TRIGGER *options* { testLiterals = *false*; }: "trigger";
KEYWORD_ARRAY *options* { testLiterals = *false*; }: "array";
NEWLINE: ('\r''\n'|'\n') {newline(); _ttype = SCAN_END; } ;
WS: (' '|'\t')+ {_ttype = Token.SKIP;} ;
*#-----------------------------------------------------------------*
The warnings I got are-
lexical nondeteminism between rules ID and KEYWORD_INT upon k==1:'i'
k==2:'n' k==3:'t' k==4:<end-of-token>
the other warnings are similar to the above....
*#-----------------------------------------------------------------*
I have the same problem with my parser too. I'm trying to build an AST tree
and while testing couln't build a few of them. I think this has been caused
due to the nondeterminism problem in the scanner. May be my thinking is
immature. My parser file is given below-
*
tokens*{
TREE_ROOT;
TREE_VAR;
VAR_DEC;
FUNC_DEC;
FORMALS;
BLOCK;
FUNC_CALL;
IF_ELSE_STMT;
LOOP_STMT;
INTERFACE_DEC;
ARRAY_DEC;
ARRAY_INDEX;
TRIGGER_DEC;
}
{
*class* ChirpErrorException *extends* RuntimeException {
ChirpErrorException(String msg) {
*super*(msg);
}
}
*private* *void* setLine(AST ast, *int* line) {
ChirpAST chirp_ast = (ChirpAST) ast;
chirp_ast.setLine(line);
}
}
file: ( var_dec (SCAN_END!)* | func_dec (SCAN_END!)*
| interface_dec (SCAN_END!)* | trigger_dec(SCAN_END!)* | array_dec
(SCAN_END!)*)+
{ #file=#([TREE_ROOT, "tree_root"], file);};
var_dec: ( data_type (assignment) (SEMI!) )
{ #var_dec=#([VAR_DEC, "var_dec"], var_dec);};
assignment: ID (ASSIGN^ NUMBER)? ;
data_type: KEYWORD_INT | KEYWORD_BYTE ;
array_dec: (data_type ID (LSQRPAREN! NUMBER RSQRPAREN!) (SEMI!))
{ #array_dec=#([ARRAY_DEC, "array_dec"], array_dec);};
array_index: (LSQRPAREN! ID RSQRPAREN!){ #array_index=#([ARRAY_INDEX,
"array_index"], array_index);};
func_dec: (func_type ID (LPAREN! ( formals ( COMMA! formals)* )? RPAREN!)
(SCAN_END!)* block)
{ #func_dec = #([FUNC_DEC, "func_dec"],func_dec);};
func_type: data_type | KEYWORD_VOID ;
formals: (data_type ID) { #formals=#([FORMALS, "formals"], formals);};
block: ( LBRACE! (SCAN_END!)* ( statement )+ (SCAN_END!)* RBRACE! )
{ #block=#([BLOCK, "block"], block);};
statement: assign_stmt (SEMI!) (SCAN_END!)* | block (SCAN_END!)* |
if_else_stmt (SCAN_END!)*
| func_call (SEMI!)(SCAN_END!)* | loop_stmt (SCAN_END!)* | return_stmt
(SEMI!)(SCAN_END!)* ;
assign_stmt: ID ( array_index )? ASSIGN^ expr;
expr: NOT rel_expr | rel_expr (( AND^ | OR^ ) rel_expr )* ;
rel_expr: arith_expr (( EQUALS^ | LT^ | GT^ ) arith_expr )* ;
arith_expr: term (( PLUS^ | SUB^ ) term )* ;
term: factor (( MULT^ | DIV^ | PERCENTILE^ ) factor )* ;
factor: ID ( LSQRPAREN! expr RSQRPAREN! )? | NUMBER | LPAREN! expr RPAREN! |
func_call ;
func_call: ( ID (DOT ID)* (LPAREN! ( expr ( COMMA! expr )* )? RPAREN!))
{ #func_call = #([FUNC_CALL, "func_call"],func_call);};
if_else_stmt: ( KEYWORD_IF (LPAREN! expr RPAREN!) (SCAN_END!)* statement
(*options* {greedy = *true*;}: KEYWORD_ELSE statement)? )
{ #if_else_stmt = #([IF_ELSE_STMT, "if_else_stmt"],if_else_stmt);};
loop_stmt: (KEYWORD_LOOP ( LPAREN! expr RPAREN!)? block (KEYWORD_WITH ID
(SEMI!))? )
{ #loop_stmt = #([LOOP_STMT, "loop_stmt"],loop_stmt);};
return_stmt: KEYWORD_RETURN ( expr )? ;
interface_dec: (KEYWORD_INTERFACE ID (DOT ID)*
(LBRACE! ((SCAN_END!)* (func_dec)+ (SCAN_END!)*)+ RBRACE!))
{ #interface_dec = #([INTERFACE_DEC, "interface_dec"], interface_dec);};
trigger_dec: (KEYWORD_TRIGGER ID (LBRACE! (SCAN_END!)*
LPAREN! expr RPAREN! (SCAN_END!)* RBRACE!) COLON! block)
{ #trigger_dec = #([TRIGGER_DEC, "trigger_dec"], trigger_dec);};
*
#--------------------------------------------------------------------------------------------
*
The warnings I got in the parser are from the "statement"(colored in redabove)-
nondeterminism upon k==1:SCAN_END k==2:SCAN_END, ID, LBRACE,
RBRACE,KEYWORD_IF,KEYWORD_ELSE,KEYWORD_LOOP, KEYWORD_RETURN
k==3:EOF,ID,SCAN_END......KEYWORD_INTERFACE,KEYWORD_TRIGGER between alt 1
and exit branch of block.
Other 5 warnings are from the same Non-terminal.
#------------------------------------------------------------------------------------------
Please help me trace the problem.
Thanks in advance,
Vinay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20061210/73570443/attachment-0001.html
More information about the antlr-interest
mailing list