[antlr-interest] Can anyone help with a basic grammar problem in Antlr 3?
Ross Bamford
roscoml at gmail.com
Thu Oct 13 17:42:40 PDT 2011
Hi Michael,
Thanks for the response! And thanks for being kind about my basic grammar
:)
I tried reordering the alternatives in expr as you suggested, and am a bit
closer now than I was before! It's definitely parsing a = 1 + (b = 2) fine,
but I'm still seeing NoViableAltExceptions with, for example "a=b+(c=2)".
Looking at the debugger step by step it seems to still be trying to grab
"b+" as a token, rather than seeing the "b" then the "+", which is why I
tried adding IDENTIFIER to the "atom" rule previously. I tried adding it
again after making the change you suggested but it still caused a lot of
problems in other places.
Thanks,
Ross
On Fri, Oct 14, 2011 at 1:04 AM, Michael Bedward
<michael.bedward at gmail.com>wrote:
> Hi Ross,
>
> For a bit of a newbie that's a nice grammar - much neater than any of mine
> :)
>
> If you rearrange your expr rule so that the assign_expr is the first
> alternative...
>
> expr
> : assign_expr
> | math_expr
> | meth_call_expr
> ;
>
> ...I think that the grammar should be able to parse things like a = 1 + (b
> = 2)
>
> Michael
>
>
> On 14 October 2011 10:38, Ross Bamford <roscoml at gmail.com> wrote:
> > Hi Guys,
> >
> > I'm a bit of an Antlr newbie - I've successfully created and used Antlr 2
> > grammars in the past but mostly by trial and error, and occasionally
> random
> > hacking until it "worked"... I've recently become involved in a project
> that
> > requires a very simple scripting language, and have decided to use Antlr
> 3
> > for this, but I'm getting stuck quite early on - I think I have a
> > fundamental problem in my grammar but after much hacking at it and trying
> > various ideas I got from Google, I'm still hitting a bit of a brick wall.
> >
> > Basically I'm at the point where I have mathematical expressions and
> various
> > literal types implemented, and am adding in function and method call
> > handling - I want to be able to call methods with or without and explicit
> > receiver, and in my language parenthesis are optional (I know that
> > complicates matters a bit but it's what I need for this project). I've
> > written the grammar so far against a set of functional tests, and all is
> > well with most of my syntax. Here is my grammar:
> >
> > /* ********* GRAMMAR *********** */
> > grammar BasicLang;
> >
> > options {
> > output=AST;
> > ASTLabelType=CommonTree;
> > backtrack=true;
> > memoize=true;
> > }
> >
> > tokens {
> > ASSIGN;
> > METHOD_CALL;
> > SELF;
> > }
> >
> > @parser::members {
> > /* throw exceptions rather than silently failing... */
> > protected void mismatch(IntStream input, int ttype, BitSet follow)
> > throws RecognitionException
> > {
> > throw new MismatchedTokenException(ttype, input);
> > }
> > public Object recoverFromMismatchedSet(IntStream input,
> > RecognitionException e, BitSet follow)
> > throws RecognitionException
> > {
> > throw e;
> > }
> > }
> >
> > @rulecatch {
> > // throw exceptions rather than silently failing...
> > catch (RecognitionException e) {
> > throw e;
> > }
> > }
> >
> > start_rule
> > : script
> > ;
> >
> > script
> > : statement*
> > ;
> >
> > statement
> > : expr terminator!
> > ;
> >
> > expr
> > : math_expr
> > | assign_expr
> > | meth_call_expr
> > ;
> >
> > meth_call_expr
> > : (IDENTIFIER DOT)? func_call_expr -> ^(METHOD_CALL IDENTIFIER?
> > func_call_expr)
> > | (STRING_LITERAL DOT)? func_call_expr -> ^(METHOD_CALL
> STRING_LITERAL?
> > func_call_expr)
> > ;
> >
> > fragment
> > func_call_expr
> > : IDENTIFIER^ argument_list
> > ;
> >
> > fragment
> > argument_list
> > : LPAREN!? (expr (COMMA! expr)*)? RPAREN!?
> > ;
> >
> > assign_expr
> > : IDENTIFIER ASSIGN expr -> ^(ASSIGN IDENTIFIER expr)
> > ;
> >
> > math_expr
> > : mult_expr ((ADD^|SUB^) mult_expr)*
> > ;
> >
> > mult_expr
> > : pow_expr ((MUL^|DIV^|MOD^) pow_expr)*
> > ;
> >
> > pow_expr
> > : unary_expr ((POW^) unary_expr)*
> > ;
> >
> > unary_expr
> > : NOT? atom
> > ;
> >
> > atom
> > : literal
> > | LPAREN! expr RPAREN!
> > ;
> >
> > literal
> > : HEX_LITERAL
> > | DECIMAL_LITERAL
> > | OCTAL_LITERAL
> > | FLOATING_POINT_LITERAL
> > // | REGEXP_LITERAL
> > | STRING_LITERAL
> > ;
> >
> > terminator
> > : TERMINATOR
> > | EOF
> > ;
> >
> > POW : '^' ;
> > MOD : '%' ;
> > ADD : '+' ;
> > SUB : '-' ;
> > DIV : '/' ;
> > MUL : '*' ;
> > NOT : '!' ;
> >
> > ASSIGN
> > : '='
> > ;
> >
> > LPAREN
> > : '('
> > ;
> >
> > RPAREN
> > : ')'
> > ;
> >
> > COMMA
> > : ','
> > ;
> >
> > DOT : '.' ;
> >
> > CHARACTER_LITERAL
> > : '\'' ( EscapeSequence | ~('\''|'\\') ) '\''
> > ;
> >
> > STRING_LITERAL
> > : '"' ( EscapeSequence | ~('\\'|'"') )* '"'
> > ;
> >
> > /*
> > REGEXP_LITERAL
> > : '/' ( EscapeSequence | ~('\\'|'"') )* '/'
> > ;
> > */
> >
> > HEX_LITERAL : '0' ('x'|'X') HexDigit+ IntegerTypeSuffix? ;
> >
> > DECIMAL_LITERAL : ('0' | '1'..'9' '0'..'9'*) IntegerTypeSuffix? ;
> >
> > OCTAL_LITERAL : '0' ('0'..'7')+ IntegerTypeSuffix? ;
> >
> > fragment
> > HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;
> >
> > fragment
> > IntegerTypeSuffix
> > : ('l'|'L')
> > | ('u'|'U') ('l'|'L')?
> > ;
> >
> > FLOATING_POINT_LITERAL
> > : ('0'..'9')+ '.' ('0'..'9')* Exponent? FloatTypeSuffix?
> > | '.' ('0'..'9')+ Exponent? FloatTypeSuffix?
> > | ('0'..'9')+ Exponent? FloatTypeSuffix?
> > ;
> >
> > fragment
> > Exponent : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
> >
> > fragment
> > FloatTypeSuffix : ('f'|'F'|'d'|'D') ;
> >
> > fragment
> > EscapeSequence
> > : '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\'|'/')
> > | OctalEscape
> > ;
> >
> > fragment
> > OctalEscape
> > : '\\' ('0'..'3') ('0'..'7') ('0'..'7')
> > | '\\' ('0'..'7') ('0'..'7')
> > | '\\' ('0'..'7')
> > ;
> >
> > fragment
> > UnicodeEscape
> > : '\\' 'u' HexDigit HexDigit HexDigit HexDigit
> > ;
> > COMMENT
> > : '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
> > ;
> >
> > LINE_COMMENT
> > : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
> > ;
> >
> > IDENTIFIER
> > : ID_LETTER (ID_LETTER|'0'..'9')*
> > ;
> >
> > fragment
> > ID_LETTER
> > : '$'
> > | 'A'..'Z'
> > | 'a'..'z'
> > | '_'
> > ;
> >
> > TERMINATOR
> > : '\r'? '\n'
> > | ';'
> > ;
> >
> > WS : (' '|'\r'|'\t'|'\u000C') {$channel=HIDDEN;}
> > | '...' '\r'? '\n' {$channel=HIDDEN;}
> > ;
> >
> > /* *************** END *************** */
> >
> > With this grammar, my tests so far pass, and I'm building trees for
> simple
> > arithmetic operations and the like, including involving variables (e.g.
> a+1
> > and the like), and method calls are working as I expect, including when
> > passing method call results as args to another method call. But I cannot
> get
> > input such as "a=b+(c=1)" to parse at all - Debugging in AntlrWorks shows
> me
> > that the problem occurs when the parse sees the "b+", when it throws a
> > NoViableAlt exception.
> >
> > I guessed this was because the parser doesn't see the identifier as an
> atom,
> > so tries to parse it with the + symbol. So, I tried adding IDENTIFIER as
> an
> > alternative to the atom rule - but that just broke the parser completely
> and
> > many of my tests failed with an exception - MismatchedSetException.
> >
> > I've been playing with this for a few days now but no matter what I do,
> even
> > when I get the type of syntax I mentioned above (the assign statement)
> > working, I invariably break something (or more often, everything! :( )
> else.
> > I'm really hoping someone out there will take pity on me and give me some
> > insight into what I'm doing wrong.
> >
> > Thanks in advance!
> > --
> > Ross Bamford - roscoml at gmail.com
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >
>
More information about the antlr-interest
mailing list