[antlr-interest] Is this nondeterminism solvable?
Steffen Schuemann
s.schuemann at pobox.com
Mon Jul 5 23:49:55 PDT 2004
Implementing a scripting language for a group of pbems. It is a
complete redesign of an existing tool, and my working prototype,
consisting of some kind of compiler and a vm backend is about 40 times
faster than my old tool. So I'm glad I started this redesign. ;-)
Right now I'm fighting some nondeterminism problems with no luck so
far.
My Grammars got rather complex now, but I reduced it to this:
// ---- begin: test.g -----------------------------------------
options {
language="Cpp";
}
class MyParser extends Parser;
options {
buildAST=true;
}
script
: scriptElement (SSEP! scriptElement)* EOF!
| SSEP! EOF!
;
scriptElement
: subroutine
;
subroutine
: CMD_PROC^ WS! name:IDENT (WS! VARIABLE)* statementBlock
;
statementBlock
: SBLOCK^ (statement (SSEP! statement)*)* EBLOCK
;
statement
: assignExpression
;
assignExpression
: expression (ASSIGN^ expression)?
;
expressionList
: expression (COMMA! expression)*
;
expression
: additiveExpression
;
additiveExpression
: multiplicativeExpression ((PLUS^|MINUS^) multiplicativeExpression)*
;
multiplicativeExpression
: postfixExpression (STAR^ postfixExpression)*
;
postfixExpression
: LPAREN! expression RPAREN!
/*
The following grammar line results in this message:
warning:nondeterminism upon
k==1:LPAREN
between alt 2 and exit branch of block
*/
| atom ((LBRACK^ expressionList RBRACK!)|(LPAREN^ (expressionList)? RPAREN!))*
;
atom: INTEGER
| IDENT
;
class MyLexer extends Lexer;
options {
k=2;
}
CMD_PROC
: "#proc"
;
WS : (WS_)+
(
(NL_ (NL_|WS_)*) {_ttype=SSEP;}
( '{' {inputState->tokenStartLine = getLine();} (NL_|WS_)* {_ttype=SBLOCK;}
| '}' {inputState->tokenStartLine = getLine();} (WS_)* {_ttype=EBLOCK;}
)?
)?
;
SSEP
: ':' (WS_)*
| (NL_ (NL_|WS_)*)
( '{' {inputState->tokenStartLine = getLine();} (NL_|WS_)* {_ttype=SBLOCK;}
| '}' {inputState->tokenStartLine = getLine();} (WS_)* {_ttype=EBLOCK;}
)?
;
SBLOCK
: '{' (NL_|WS_)*
;
EBLOCK
: '}' (WS_)*
;
protected
WS_ : ( ' '
| '\t'
)
;
protected
NL_ options {generateAmbigWarnings=false;}
: ( { LA(2)=='\n' }? "\r\n" // DOS/Windows
| '\r' // Macintosh
| '\n' // Unix
)
{ newline(); }
;
LPAREN
: '('
;
RPAREN
: ')'
;
LBRACK: '['
;
RBRACK: ']'
;
STAR: '*'
;
PLUS: '+'
;
MINUS: '-'
;
ASSIGN
: '='
;
COMMA
: ','
;
IDENT
: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;
VARIABLE
: '$' ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;
INTEGER
: ('1'..'9') ('0'..'9')*
;
// ---- end: test.g -----------------------------------------
As you see, there are some komplex rules involved in handling of
whitespace and newlines combined with ':', '{' and '}'. The Language
has some unusual aspects:
- spaces (outside of strings) seperate parameters
(as a result, it is not allowed to use spaces in an expression)
- ':' seperates statements in same line
- newlines seperate statements
- only statements with blocks (#proc/#while/#if) can have
newlines around curly backets
There are no #if/#while-statements in this reduced grammar,
but the exotic whitespace/newline-handling is still there,
just in case this is the source of all my trouble.
Is the nondeterminism solvable, or will I have to disable
the warning? I tried for some days now, and start to get
frustated a bit. ;-)
Any help (or hint on improving the whitespace/newline
handling) is really apreciated.
PS: If someone is interessted in the full grammar, I can
upload it to my site. Of course compiler/interpreter
and vm will be open source, but they are rather specific
to german atlantis pbems at the moment.
--
web: http://www.gulrak.net
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list