[antlr-interest] Can anyone help with a basic grammar problem in Antlr 3?

Ross Bamford roscoml at gmail.com
Thu Oct 13 17:42:40 PDT 2011


Hi Michael,

Thanks for the response! And thanks for being kind about my basic grammar
:)

I tried reordering the alternatives in expr as you suggested, and am a bit
closer now than I was before! It's definitely parsing a = 1 + (b = 2) fine,
but I'm still seeing NoViableAltExceptions with, for example "a=b+(c=2)".
Looking at the debugger step by step it seems to still be trying to grab
"b+" as a token, rather than seeing the "b" then the "+", which is why I
tried adding IDENTIFIER to the "atom" rule previously. I tried adding it
again after making the change you suggested but it still caused a lot of
problems in other places.

Thanks,
Ross


On Fri, Oct 14, 2011 at 1:04 AM, Michael Bedward
<michael.bedward at gmail.com>wrote:

> Hi Ross,
>
> For a bit of a newbie that's a nice grammar - much neater than any of mine
> :)
>
> If you rearrange your expr rule so that the assign_expr is the first
> alternative...
>
> expr
>  :   assign_expr
>  |   math_expr
>  |   meth_call_expr
>  ;
>
> ...I think that the grammar should be able to parse things like a = 1 + (b
> = 2)
>
> Michael
>
>
> On 14 October 2011 10:38, Ross Bamford <roscoml at gmail.com> wrote:
> > Hi Guys,
> >
> > I'm a bit of an Antlr newbie - I've successfully created and used Antlr 2
> > grammars in the past but mostly by trial and error, and occasionally
> random
> > hacking until it "worked"... I've recently become involved in a project
> that
> > requires a very simple scripting language, and have decided to use Antlr
> 3
> > for this, but I'm getting stuck quite early on - I think I have a
> > fundamental problem in my grammar but after much hacking at it and trying
> > various ideas I got from Google, I'm still hitting a bit of a brick wall.
> >
> > Basically I'm at the point where I have mathematical expressions and
> various
> > literal types implemented, and am adding in function and method call
> > handling - I want to be able to call methods with or without and explicit
> > receiver, and in my language parenthesis are optional (I know that
> > complicates matters a bit but it's what I need for this project). I've
> > written the grammar so far against a set of functional tests, and all is
> > well with most of my syntax. Here is my grammar:
> >
> > /* ********* GRAMMAR *********** */
> > grammar BasicLang;
> >
> > options {
> >    output=AST;
> >    ASTLabelType=CommonTree;
> >    backtrack=true;
> >    memoize=true;
> > }
> >
> > tokens {
> >  ASSIGN;
> >  METHOD_CALL;
> >  SELF;
> > }
> >
> > @parser::members {
> >  /* throw exceptions rather than silently failing... */
> > protected void mismatch(IntStream input, int ttype, BitSet follow)
> >  throws RecognitionException
> > {
> >  throw new MismatchedTokenException(ttype, input);
> > }
> >  public Object recoverFromMismatchedSet(IntStream input,
> > RecognitionException e, BitSet follow)
> >  throws RecognitionException
> > {
> >  throw e;
> > }
> > }
> >
> > @rulecatch {
> > // throw exceptions rather than silently failing...
> > catch (RecognitionException e) {
> >  throw e;
> > }
> > }
> >
> > start_rule
> >  :   script
> >  ;
> >
> > script
> >  :   statement*
> >  ;
> >
> > statement
> >  :   expr terminator!
> >  ;
> >
> > expr
> >  :   math_expr
> >  |   assign_expr
> >  |   meth_call_expr
> >  ;
> >
> > meth_call_expr
> >  :   (IDENTIFIER DOT)? func_call_expr -> ^(METHOD_CALL IDENTIFIER?
> > func_call_expr)
> >  |   (STRING_LITERAL DOT)? func_call_expr -> ^(METHOD_CALL
> STRING_LITERAL?
> > func_call_expr)
> >  ;
> >
> > fragment
> > func_call_expr
> >  :   IDENTIFIER^ argument_list
> >  ;
> >
> > fragment
> > argument_list
> >  :   LPAREN!? (expr (COMMA! expr)*)? RPAREN!?
> >  ;
> >
> > assign_expr
> >  :   IDENTIFIER ASSIGN expr -> ^(ASSIGN IDENTIFIER expr)
> >  ;
> >
> > math_expr
> >  :   mult_expr ((ADD^|SUB^) mult_expr)*
> >  ;
> >
> > mult_expr
> >  :   pow_expr ((MUL^|DIV^|MOD^) pow_expr)*
> >  ;
> >
> > pow_expr
> >  :   unary_expr ((POW^) unary_expr)*
> >  ;
> >
> > unary_expr
> >  :   NOT? atom
> >  ;
> >
> > atom
> >  :     literal
> >  |     LPAREN! expr RPAREN!
> >  ;
> >
> > literal
> >  :     HEX_LITERAL
> >  |     DECIMAL_LITERAL
> >  |     OCTAL_LITERAL
> >  |     FLOATING_POINT_LITERAL
> > //  |     REGEXP_LITERAL
> >  |     STRING_LITERAL
> >  ;
> >
> > terminator
> >  :     TERMINATOR
> >  |     EOF
> >  ;
> >
> > POW :   '^' ;
> > MOD :   '%' ;
> > ADD :   '+' ;
> > SUB :   '-' ;
> > DIV :   '/' ;
> > MUL :   '*' ;
> > NOT :   '!' ;
> >
> > ASSIGN
> >    :   '='
> >    ;
> >
> > LPAREN
> >    :   '('
> >    ;
> >
> > RPAREN
> >    :   ')'
> >    ;
> >
> > COMMA
> >    :   ','
> >    ;
> >
> > DOT :   '.' ;
> >
> > CHARACTER_LITERAL
> >    :   '\'' ( EscapeSequence | ~('\''|'\\') ) '\''
> >    ;
> >
> > STRING_LITERAL
> >    :  '"' ( EscapeSequence | ~('\\'|'"') )* '"'
> >    ;
> >
> > /*
> > REGEXP_LITERAL
> >    :  '/' ( EscapeSequence | ~('\\'|'"') )* '/'
> >    ;
> > */
> >
> > HEX_LITERAL : '0' ('x'|'X') HexDigit+ IntegerTypeSuffix? ;
> >
> > DECIMAL_LITERAL : ('0' | '1'..'9' '0'..'9'*) IntegerTypeSuffix? ;
> >
> > OCTAL_LITERAL : '0' ('0'..'7')+ IntegerTypeSuffix? ;
> >
> > fragment
> > HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;
> >
> > fragment
> > IntegerTypeSuffix
> >  : ('l'|'L')
> >  | ('u'|'U')  ('l'|'L')?
> >  ;
> >
> > FLOATING_POINT_LITERAL
> >    :   ('0'..'9')+ '.' ('0'..'9')* Exponent? FloatTypeSuffix?
> >    |   '.' ('0'..'9')+ Exponent? FloatTypeSuffix?
> >    |   ('0'..'9')+ Exponent? FloatTypeSuffix?
> >  ;
> >
> > fragment
> > Exponent : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
> >
> > fragment
> > FloatTypeSuffix : ('f'|'F'|'d'|'D') ;
> >
> > fragment
> > EscapeSequence
> >    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\'|'/')
> >    |   OctalEscape
> >    ;
> >
> > fragment
> > OctalEscape
> >    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
> >    |   '\\' ('0'..'7') ('0'..'7')
> >    |   '\\' ('0'..'7')
> >    ;
> >
> > fragment
> > UnicodeEscape
> >    :   '\\' 'u' HexDigit HexDigit HexDigit HexDigit
> >    ;
> > COMMENT
> >    :   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
> >    ;
> >
> > LINE_COMMENT
> >    : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
> >    ;
> >
> > IDENTIFIER
> >  : ID_LETTER (ID_LETTER|'0'..'9')*
> >  ;
> >
> > fragment
> > ID_LETTER
> >  : '$'
> >  | 'A'..'Z'
> >  | 'a'..'z'
> >  | '_'
> >  ;
> >
> > TERMINATOR
> >  : '\r'? '\n'
> >  | ';'
> >  ;
> >
> > WS  :  (' '|'\r'|'\t'|'\u000C') {$channel=HIDDEN;}
> >    |  '...' '\r'? '\n'  {$channel=HIDDEN;}
> >    ;
> >
> > /* *************** END *************** */
> >
> > With this grammar, my tests so far pass, and I'm building trees for
> simple
> > arithmetic operations and the like, including involving variables (e.g.
> a+1
> > and the like), and method calls are working as I expect, including when
> > passing method call results as args to another method call. But I cannot
> get
> > input such as "a=b+(c=1)" to parse at all - Debugging in AntlrWorks shows
> me
> > that the problem occurs when the parse sees the "b+", when it throws a
> > NoViableAlt exception.
> >
> > I guessed this was because the parser doesn't see the identifier as an
> atom,
> > so tries to parse it with the + symbol. So, I tried adding IDENTIFIER as
> an
> > alternative to the atom rule - but that just broke the parser completely
> and
> > many of my tests failed with an exception - MismatchedSetException.
> >
> > I've been playing with this for a few days now but no matter what I do,
> even
> > when I get the type of syntax I mentioned above (the assign statement)
> > working, I invariably break something (or more often, everything! :( )
> else.
> > I'm really hoping someone out there will take pity on me and give me some
> > insight into what I'm doing wrong.
> >
> > Thanks in advance!
> > --
> > Ross Bamford - roscoml at gmail.com
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >
>


More information about the antlr-interest mailing list