[antlr-interest] Can anyone help with a basic grammar problem in Antlr 3?
Ross Bamford
roscoml at gmail.com
Thu Oct 13 16:38:12 PDT 2011
Hi Guys,
I'm a bit of an Antlr newbie - I've successfully created and used Antlr 2
grammars in the past but mostly by trial and error, and occasionally random
hacking until it "worked"... I've recently become involved in a project that
requires a very simple scripting language, and have decided to use Antlr 3
for this, but I'm getting stuck quite early on - I think I have a
fundamental problem in my grammar but after much hacking at it and trying
various ideas I got from Google, I'm still hitting a bit of a brick wall.
Basically I'm at the point where I have mathematical expressions and various
literal types implemented, and am adding in function and method call
handling - I want to be able to call methods with or without and explicit
receiver, and in my language parenthesis are optional (I know that
complicates matters a bit but it's what I need for this project). I've
written the grammar so far against a set of functional tests, and all is
well with most of my syntax. Here is my grammar:
/* ********* GRAMMAR *********** */
grammar BasicLang;
options {
output=AST;
ASTLabelType=CommonTree;
backtrack=true;
memoize=true;
}
tokens {
ASSIGN;
METHOD_CALL;
SELF;
}
@parser::members {
/* throw exceptions rather than silently failing... */
protected void mismatch(IntStream input, int ttype, BitSet follow)
throws RecognitionException
{
throw new MismatchedTokenException(ttype, input);
}
public Object recoverFromMismatchedSet(IntStream input,
RecognitionException e, BitSet follow)
throws RecognitionException
{
throw e;
}
}
@rulecatch {
// throw exceptions rather than silently failing...
catch (RecognitionException e) {
throw e;
}
}
start_rule
: script
;
script
: statement*
;
statement
: expr terminator!
;
expr
: math_expr
| assign_expr
| meth_call_expr
;
meth_call_expr
: (IDENTIFIER DOT)? func_call_expr -> ^(METHOD_CALL IDENTIFIER?
func_call_expr)
| (STRING_LITERAL DOT)? func_call_expr -> ^(METHOD_CALL STRING_LITERAL?
func_call_expr)
;
fragment
func_call_expr
: IDENTIFIER^ argument_list
;
fragment
argument_list
: LPAREN!? (expr (COMMA! expr)*)? RPAREN!?
;
assign_expr
: IDENTIFIER ASSIGN expr -> ^(ASSIGN IDENTIFIER expr)
;
math_expr
: mult_expr ((ADD^|SUB^) mult_expr)*
;
mult_expr
: pow_expr ((MUL^|DIV^|MOD^) pow_expr)*
;
pow_expr
: unary_expr ((POW^) unary_expr)*
;
unary_expr
: NOT? atom
;
atom
: literal
| LPAREN! expr RPAREN!
;
literal
: HEX_LITERAL
| DECIMAL_LITERAL
| OCTAL_LITERAL
| FLOATING_POINT_LITERAL
// | REGEXP_LITERAL
| STRING_LITERAL
;
terminator
: TERMINATOR
| EOF
;
POW : '^' ;
MOD : '%' ;
ADD : '+' ;
SUB : '-' ;
DIV : '/' ;
MUL : '*' ;
NOT : '!' ;
ASSIGN
: '='
;
LPAREN
: '('
;
RPAREN
: ')'
;
COMMA
: ','
;
DOT : '.' ;
CHARACTER_LITERAL
: '\'' ( EscapeSequence | ~('\''|'\\') ) '\''
;
STRING_LITERAL
: '"' ( EscapeSequence | ~('\\'|'"') )* '"'
;
/*
REGEXP_LITERAL
: '/' ( EscapeSequence | ~('\\'|'"') )* '/'
;
*/
HEX_LITERAL : '0' ('x'|'X') HexDigit+ IntegerTypeSuffix? ;
DECIMAL_LITERAL : ('0' | '1'..'9' '0'..'9'*) IntegerTypeSuffix? ;
OCTAL_LITERAL : '0' ('0'..'7')+ IntegerTypeSuffix? ;
fragment
HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment
IntegerTypeSuffix
: ('l'|'L')
| ('u'|'U') ('l'|'L')?
;
FLOATING_POINT_LITERAL
: ('0'..'9')+ '.' ('0'..'9')* Exponent? FloatTypeSuffix?
| '.' ('0'..'9')+ Exponent? FloatTypeSuffix?
| ('0'..'9')+ Exponent? FloatTypeSuffix?
;
fragment
Exponent : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
fragment
FloatTypeSuffix : ('f'|'F'|'d'|'D') ;
fragment
EscapeSequence
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\'|'/')
| OctalEscape
;
fragment
OctalEscape
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UnicodeEscape
: '\\' 'u' HexDigit HexDigit HexDigit HexDigit
;
COMMENT
: '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
;
LINE_COMMENT
: '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
;
IDENTIFIER
: ID_LETTER (ID_LETTER|'0'..'9')*
;
fragment
ID_LETTER
: '$'
| 'A'..'Z'
| 'a'..'z'
| '_'
;
TERMINATOR
: '\r'? '\n'
| ';'
;
WS : (' '|'\r'|'\t'|'\u000C') {$channel=HIDDEN;}
| '...' '\r'? '\n' {$channel=HIDDEN;}
;
/* *************** END *************** */
With this grammar, my tests so far pass, and I'm building trees for simple
arithmetic operations and the like, including involving variables (e.g. a+1
and the like), and method calls are working as I expect, including when
passing method call results as args to another method call. But I cannot get
input such as "a=b+(c=1)" to parse at all - Debugging in AntlrWorks shows me
that the problem occurs when the parse sees the "b+", when it throws a
NoViableAlt exception.
I guessed this was because the parser doesn't see the identifier as an atom,
so tries to parse it with the + symbol. So, I tried adding IDENTIFIER as an
alternative to the atom rule - but that just broke the parser completely and
many of my tests failed with an exception - MismatchedSetException.
I've been playing with this for a few days now but no matter what I do, even
when I get the type of syntax I mentioned above (the assign statement)
working, I invariably break something (or more often, everything! :( ) else.
I'm really hoping someone out there will take pity on me and give me some
insight into what I'm doing wrong.
Thanks in advance!
--
Ross Bamford - roscoml at gmail.com
More information about the antlr-interest
mailing list