[antlr-interest] Can anyone help with a basic grammar problem in Antlr 3?

Ross Bamford roscoml at gmail.com
Thu Oct 13 16:38:12 PDT 2011


Hi Guys,

I'm a bit of an Antlr newbie - I've successfully created and used Antlr 2
grammars in the past but mostly by trial and error, and occasionally random
hacking until it "worked"... I've recently become involved in a project that
requires a very simple scripting language, and have decided to use Antlr 3
for this, but I'm getting stuck quite early on - I think I have a
fundamental problem in my grammar but after much hacking at it and trying
various ideas I got from Google, I'm still hitting a bit of a brick wall.

Basically I'm at the point where I have mathematical expressions and various
literal types implemented, and am adding in function and method call
handling - I want to be able to call methods with or without and explicit
receiver, and in my language parenthesis are optional (I know that
complicates matters a bit but it's what I need for this project). I've
written the grammar so far against a set of functional tests, and all is
well with most of my syntax. Here is my grammar:

/* ********* GRAMMAR *********** */
grammar BasicLang;

options {
    output=AST;
    ASTLabelType=CommonTree;
    backtrack=true;
    memoize=true;
}

tokens {
  ASSIGN;
  METHOD_CALL;
  SELF;
}

@parser::members {
  /* throw exceptions rather than silently failing... */
protected void mismatch(IntStream input, int ttype, BitSet follow)
  throws RecognitionException
{
  throw new MismatchedTokenException(ttype, input);
}
 public Object recoverFromMismatchedSet(IntStream input,
RecognitionException e, BitSet follow)
  throws RecognitionException
{
  throw e;
}
}

@rulecatch {
// throw exceptions rather than silently failing...
catch (RecognitionException e) {
  throw e;
}
}

start_rule
  :   script
  ;

script
  :   statement*
  ;

statement
  :   expr terminator!
  ;

expr
  :   math_expr
  |   assign_expr
  |   meth_call_expr
  ;

meth_call_expr
  :   (IDENTIFIER DOT)? func_call_expr -> ^(METHOD_CALL IDENTIFIER?
func_call_expr)
  |   (STRING_LITERAL DOT)? func_call_expr -> ^(METHOD_CALL STRING_LITERAL?
func_call_expr)
  ;

fragment
func_call_expr
  :   IDENTIFIER^ argument_list
  ;

fragment
argument_list
  :   LPAREN!? (expr (COMMA! expr)*)? RPAREN!?
  ;

assign_expr
  :   IDENTIFIER ASSIGN expr -> ^(ASSIGN IDENTIFIER expr)
  ;

math_expr
  :   mult_expr ((ADD^|SUB^) mult_expr)*
  ;

mult_expr
  :   pow_expr ((MUL^|DIV^|MOD^) pow_expr)*
  ;

pow_expr
  :   unary_expr ((POW^) unary_expr)*
  ;

unary_expr
  :   NOT? atom
  ;

atom
  :     literal
  |     LPAREN! expr RPAREN!
  ;

literal
  :     HEX_LITERAL
  |     DECIMAL_LITERAL
  |     OCTAL_LITERAL
  |     FLOATING_POINT_LITERAL
//  |     REGEXP_LITERAL
  |     STRING_LITERAL
  ;

terminator
  :     TERMINATOR
  |     EOF
  ;

POW :   '^' ;
MOD :   '%' ;
ADD :   '+' ;
SUB :   '-' ;
DIV :   '/' ;
MUL :   '*' ;
NOT :   '!' ;

ASSIGN
    :   '='
    ;

LPAREN
    :   '('
    ;

RPAREN
    :   ')'
    ;

COMMA
    :   ','
    ;

DOT :   '.' ;

CHARACTER_LITERAL
    :   '\'' ( EscapeSequence | ~('\''|'\\') ) '\''
    ;

STRING_LITERAL
    :  '"' ( EscapeSequence | ~('\\'|'"') )* '"'
    ;

/*
REGEXP_LITERAL
    :  '/' ( EscapeSequence | ~('\\'|'"') )* '/'
    ;
*/

HEX_LITERAL : '0' ('x'|'X') HexDigit+ IntegerTypeSuffix? ;

DECIMAL_LITERAL : ('0' | '1'..'9' '0'..'9'*) IntegerTypeSuffix? ;

OCTAL_LITERAL : '0' ('0'..'7')+ IntegerTypeSuffix? ;

fragment
HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
IntegerTypeSuffix
  : ('l'|'L')
  | ('u'|'U')  ('l'|'L')?
  ;

FLOATING_POINT_LITERAL
    :   ('0'..'9')+ '.' ('0'..'9')* Exponent? FloatTypeSuffix?
    |   '.' ('0'..'9')+ Exponent? FloatTypeSuffix?
    |   ('0'..'9')+ Exponent? FloatTypeSuffix?
  ;

fragment
Exponent : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;

fragment
FloatTypeSuffix : ('f'|'F'|'d'|'D') ;

fragment
EscapeSequence
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\'|'/')
    |   OctalEscape
    ;

fragment
OctalEscape
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UnicodeEscape
    :   '\\' 'u' HexDigit HexDigit HexDigit HexDigit
    ;
COMMENT
    :   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

LINE_COMMENT
    : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    ;

IDENTIFIER
  : ID_LETTER (ID_LETTER|'0'..'9')*
  ;

fragment
ID_LETTER
  : '$'
  | 'A'..'Z'
  | 'a'..'z'
  | '_'
  ;

TERMINATOR
  : '\r'? '\n'
  | ';'
  ;

WS  :  (' '|'\r'|'\t'|'\u000C') {$channel=HIDDEN;}
    |  '...' '\r'? '\n'  {$channel=HIDDEN;}
    ;

/* *************** END *************** */

With this grammar, my tests so far pass, and I'm building trees for simple
arithmetic operations and the like, including involving variables (e.g. a+1
and the like), and method calls are working as I expect, including when
passing method call results as args to another method call. But I cannot get
input such as "a=b+(c=1)" to parse at all - Debugging in AntlrWorks shows me
that the problem occurs when the parse sees the "b+", when it throws a
NoViableAlt exception.

I guessed this was because the parser doesn't see the identifier as an atom,
so tries to parse it with the + symbol. So, I tried adding IDENTIFIER as an
alternative to the atom rule - but that just broke the parser completely and
many of my tests failed with an exception - MismatchedSetException.

I've been playing with this for a few days now but no matter what I do, even
when I get the type of syntax I mentioned above (the assign statement)
working, I invariably break something (or more often, everything! :( ) else.
I'm really hoping someone out there will take pity on me and give me some
insight into what I'm doing wrong.

Thanks in advance!
-- 
Ross Bamford - roscoml at gmail.com


More information about the antlr-interest mailing list