[antlr-interest] Can anyone help with a basic grammar problem in Antlr 3?

Thu Oct 13 18:12:00 PDT 2011

Ah yes, it's getting stuck on the "b" because you haven't told it that
identifiers are atoms...

atom
 :     literal
 |     IDENTIFIER
 |     LPAREN! expr RPAREN!
 ;

Michael

On 14 October 2011 11:42, Ross Bamford <roscoml at gmail.com> wrote:
> Hi Michael,
> Thanks for the response! And thanks for being kind about my basic grammar
> :)
> I tried reordering the alternatives in expr as you suggested, and am a bit
> closer now than I was before! It's definitely parsing a = 1 + (b = 2) fine,
> but I'm still seeing NoViableAltExceptions with, for example "a=b+(c=2)".
> Looking at the debugger step by step it seems to still be trying to grab
> "b+" as a token, rather than seeing the "b" then the "+", which is why I
> tried adding IDENTIFIER to the "atom" rule previously. I tried adding it
> again after making the change you suggested but it still caused a lot of
> problems in other places.
> Thanks,
> Ross
>
> On Fri, Oct 14, 2011 at 1:04 AM, Michael Bedward <michael.bedward at gmail.com>
> wrote:
>>
>> Hi Ross,
>>
>> For a bit of a newbie that's a nice grammar - much neater than any of mine
>> :)
>>
>> If you rearrange your expr rule so that the assign_expr is the first
>> alternative...
>>
>> expr
>>  :   assign_expr
>>  |   math_expr
>>  |   meth_call_expr
>>  ;
>>
>> ...I think that the grammar should be able to parse things like a = 1 + (b
>> = 2)
>>
>> Michael
>>
>>
>> On 14 October 2011 10:38, Ross Bamford <roscoml at gmail.com> wrote:
>> > Hi Guys,
>> >
>> > I'm a bit of an Antlr newbie - I've successfully created and used Antlr
>> > 2
>> > grammars in the past but mostly by trial and error, and occasionally
>> > random
>> > hacking until it "worked"... I've recently become involved in a project
>> > that
>> > requires a very simple scripting language, and have decided to use Antlr
>> > 3
>> > for this, but I'm getting stuck quite early on - I think I have a
>> > fundamental problem in my grammar but after much hacking at it and
>> > trying
>> > various ideas I got from Google, I'm still hitting a bit of a brick
>> > wall.
>> >
>> > Basically I'm at the point where I have mathematical expressions and
>> > various
>> > literal types implemented, and am adding in function and method call
>> > handling - I want to be able to call methods with or without and
>> > explicit
>> > receiver, and in my language parenthesis are optional (I know that
>> > complicates matters a bit but it's what I need for this project). I've
>> > written the grammar so far against a set of functional tests, and all is
>> > well with most of my syntax. Here is my grammar:
>> >
>> > /* ********* GRAMMAR *********** */
>> > grammar BasicLang;
>> >
>> > options {
>> >    output=AST;
>> >    ASTLabelType=CommonTree;
>> >    backtrack=true;
>> >    memoize=true;
>> > }
>> >
>> > tokens {
>> >  ASSIGN;
>> >  METHOD_CALL;
>> >  SELF;
>> > }
>> >
>> > @parser::members {
>> >  /* throw exceptions rather than silently failing... */
>> > protected void mismatch(IntStream input, int ttype, BitSet follow)
>> >  throws RecognitionException
>> > {
>> >  throw new MismatchedTokenException(ttype, input);
>> > }
>> >  public Object recoverFromMismatchedSet(IntStream input,
>> > RecognitionException e, BitSet follow)
>> >  throws RecognitionException
>> > {
>> >  throw e;
>> > }
>> > }
>> >
>> > @rulecatch {
>> > // throw exceptions rather than silently failing...
>> > catch (RecognitionException e) {
>> >  throw e;
>> > }
>> > }
>> >
>> > start_rule
>> >  :   script
>> >  ;
>> >
>> > script
>> >  :   statement*
>> >  ;
>> >
>> > statement
>> >  :   expr terminator!
>> >  ;
>> >
>> > expr
>> >  :   math_expr
>> >  |   assign_expr
>> >  |   meth_call_expr
>> >  ;
>> >
>> > meth_call_expr
>> >  :   (IDENTIFIER DOT)? func_call_expr -> ^(METHOD_CALL IDENTIFIER?
>> > func_call_expr)
>> >  |   (STRING_LITERAL DOT)? func_call_expr -> ^(METHOD_CALL
>> > STRING_LITERAL?
>> > func_call_expr)
>> >  ;
>> >
>> > fragment
>> > func_call_expr
>> >  :   IDENTIFIER^ argument_list
>> >  ;
>> >
>> > fragment
>> > argument_list
>> >  :   LPAREN!? (expr (COMMA! expr)*)? RPAREN!?
>> >  ;
>> >
>> > assign_expr
>> >  :   IDENTIFIER ASSIGN expr -> ^(ASSIGN IDENTIFIER expr)
>> >  ;
>> >
>> > math_expr
>> >  :   mult_expr ((ADD^|SUB^) mult_expr)*
>> >  ;
>> >
>> > mult_expr
>> >  :   pow_expr ((MUL^|DIV^|MOD^) pow_expr)*
>> >  ;
>> >
>> > pow_expr
>> >  :   unary_expr ((POW^) unary_expr)*
>> >  ;
>> >
>> > unary_expr
>> >  :   NOT? atom
>> >  ;
>> >
>> > atom
>> >  :     literal
>> >  |     LPAREN! expr RPAREN!
>> >  ;
>> >
>> > literal
>> >  :     HEX_LITERAL
>> >  |     DECIMAL_LITERAL
>> >  |     OCTAL_LITERAL
>> >  |     FLOATING_POINT_LITERAL
>> > //  |     REGEXP_LITERAL
>> >  |     STRING_LITERAL
>> >  ;
>> >
>> > terminator
>> >  :     TERMINATOR
>> >  |     EOF
>> >  ;
>> >
>> > POW :   '^' ;
>> > MOD :   '%' ;
>> > ADD :   '+' ;
>> > SUB :   '-' ;
>> > DIV :   '/' ;
>> > MUL :   '*' ;
>> > NOT :   '!' ;
>> >
>> > ASSIGN
>> >    :   '='
>> >    ;
>> >
>> > LPAREN
>> >    :   '('
>> >    ;
>> >
>> > RPAREN
>> >    :   ')'
>> >    ;
>> >
>> > COMMA
>> >    :   ','
>> >    ;
>> >
>> > DOT :   '.' ;
>> >
>> > CHARACTER_LITERAL
>> >    :   '\'' ( EscapeSequence | ~('\''|'\\') ) '\''
>> >    ;
>> >
>> > STRING_LITERAL
>> >    :  '"' ( EscapeSequence | ~('\\'|'"') )* '"'
>> >    ;
>> >
>> > /*
>> > REGEXP_LITERAL
>> >    :  '/' ( EscapeSequence | ~('\\'|'"') )* '/'
>> >    ;
>> > */
>> >
>> > HEX_LITERAL : '0' ('x'|'X') HexDigit+ IntegerTypeSuffix? ;
>> >
>> > DECIMAL_LITERAL : ('0' | '1'..'9' '0'..'9'*) IntegerTypeSuffix? ;
>> >
>> > OCTAL_LITERAL : '0' ('0'..'7')+ IntegerTypeSuffix? ;
>> >
>> > fragment
>> > HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;
>> >
>> > fragment
>> > IntegerTypeSuffix
>> >  : ('l'|'L')
>> >  | ('u'|'U')  ('l'|'L')?
>> >  ;
>> >
>> > FLOATING_POINT_LITERAL
>> >    :   ('0'..'9')+ '.' ('0'..'9')* Exponent? FloatTypeSuffix?
>> >    |   '.' ('0'..'9')+ Exponent? FloatTypeSuffix?
>> >    |   ('0'..'9')+ Exponent? FloatTypeSuffix?
>> >  ;
>> >
>> > fragment
>> > Exponent : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
>> >
>> > fragment
>> > FloatTypeSuffix : ('f'|'F'|'d'|'D') ;
>> >
>> > fragment
>> > EscapeSequence
>> >    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\'|'/')
>> >    |   OctalEscape
>> >    ;
>> >
>> > fragment
>> > OctalEscape
>> >    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
>> >    |   '\\' ('0'..'7') ('0'..'7')
>> >    |   '\\' ('0'..'7')
>> >    ;
>> >
>> > fragment
>> > UnicodeEscape
>> >    :   '\\' 'u' HexDigit HexDigit HexDigit HexDigit
>> >    ;
>> > COMMENT
>> >    :   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
>> >    ;
>> >
>> > LINE_COMMENT
>> >    : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
>> >    ;
>> >
>> > IDENTIFIER
>> >  : ID_LETTER (ID_LETTER|'0'..'9')*
>> >  ;
>> >
>> > fragment
>> > ID_LETTER
>> >  : '$'
>> >  | 'A'..'Z'
>> >  | 'a'..'z'
>> >  | '_'
>> >  ;
>> >
>> > TERMINATOR
>> >  : '\r'? '\n'
>> >  | ';'
>> >  ;
>> >
>> > WS  :  (' '|'\r'|'\t'|'\u000C') {$channel=HIDDEN;}
>> >    |  '...' '\r'? '\n'  {$channel=HIDDEN;}
>> >    ;
>> >
>> > /* *************** END *************** */
>> >
>> > With this grammar, my tests so far pass, and I'm building trees for
>> > simple
>> > arithmetic operations and the like, including involving variables (e.g.
>> > a+1
>> > and the like), and method calls are working as I expect, including when
>> > passing method call results as args to another method call. But I cannot
>> > get
>> > input such as "a=b+(c=1)" to parse at all - Debugging in AntlrWorks
>> > shows me
>> > that the problem occurs when the parse sees the "b+", when it throws a
>> > NoViableAlt exception.
>> >
>> > I guessed this was because the parser doesn't see the identifier as an
>> > atom,
>> > so tries to parse it with the + symbol. So, I tried adding IDENTIFIER as
>> > an
>> > alternative to the atom rule - but that just broke the parser completely
>> > and
>> > many of my tests failed with an exception - MismatchedSetException.
>> >
>> > I've been playing with this for a few days now but no matter what I do,
>> > even
>> > when I get the type of syntax I mentioned above (the assign statement)
>> > working, I invariably break something (or more often, everything! :( )
>> > else.
>> > I'm really hoping someone out there will take pity on me and give me
>> > some
>> > insight into what I'm doing wrong.
>> >
>> > Thanks in advance!
>> > --
>> > Ross Bamford - roscoml at gmail.com
>> >
>> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> > Unsubscribe:
>> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>> >
>
>