[antlr-interest] Can anyone help with a basic grammar problem in Antlr 3?
Michael Bedward
michael.bedward at gmail.com
Thu Oct 13 18:12:00 PDT 2011
Ah yes, it's getting stuck on the "b" because you haven't told it that
identifiers are atoms...
atom
: literal
| IDENTIFIER
| LPAREN! expr RPAREN!
;
Michael
On 14 October 2011 11:42, Ross Bamford <roscoml at gmail.com> wrote:
> Hi Michael,
> Thanks for the response! And thanks for being kind about my basic grammar
> :)
> I tried reordering the alternatives in expr as you suggested, and am a bit
> closer now than I was before! It's definitely parsing a = 1 + (b = 2) fine,
> but I'm still seeing NoViableAltExceptions with, for example "a=b+(c=2)".
> Looking at the debugger step by step it seems to still be trying to grab
> "b+" as a token, rather than seeing the "b" then the "+", which is why I
> tried adding IDENTIFIER to the "atom" rule previously. I tried adding it
> again after making the change you suggested but it still caused a lot of
> problems in other places.
> Thanks,
> Ross
>
> On Fri, Oct 14, 2011 at 1:04 AM, Michael Bedward <michael.bedward at gmail.com>
> wrote:
>>
>> Hi Ross,
>>
>> For a bit of a newbie that's a nice grammar - much neater than any of mine
>> :)
>>
>> If you rearrange your expr rule so that the assign_expr is the first
>> alternative...
>>
>> expr
>> : assign_expr
>> | math_expr
>> | meth_call_expr
>> ;
>>
>> ...I think that the grammar should be able to parse things like a = 1 + (b
>> = 2)
>>
>> Michael
>>
>>
>> On 14 October 2011 10:38, Ross Bamford <roscoml at gmail.com> wrote:
>> > Hi Guys,
>> >
>> > I'm a bit of an Antlr newbie - I've successfully created and used Antlr
>> > 2
>> > grammars in the past but mostly by trial and error, and occasionally
>> > random
>> > hacking until it "worked"... I've recently become involved in a project
>> > that
>> > requires a very simple scripting language, and have decided to use Antlr
>> > 3
>> > for this, but I'm getting stuck quite early on - I think I have a
>> > fundamental problem in my grammar but after much hacking at it and
>> > trying
>> > various ideas I got from Google, I'm still hitting a bit of a brick
>> > wall.
>> >
>> > Basically I'm at the point where I have mathematical expressions and
>> > various
>> > literal types implemented, and am adding in function and method call
>> > handling - I want to be able to call methods with or without and
>> > explicit
>> > receiver, and in my language parenthesis are optional (I know that
>> > complicates matters a bit but it's what I need for this project). I've
>> > written the grammar so far against a set of functional tests, and all is
>> > well with most of my syntax. Here is my grammar:
>> >
>> > /* ********* GRAMMAR *********** */
>> > grammar BasicLang;
>> >
>> > options {
>> > output=AST;
>> > ASTLabelType=CommonTree;
>> > backtrack=true;
>> > memoize=true;
>> > }
>> >
>> > tokens {
>> > ASSIGN;
>> > METHOD_CALL;
>> > SELF;
>> > }
>> >
>> > @parser::members {
>> > /* throw exceptions rather than silently failing... */
>> > protected void mismatch(IntStream input, int ttype, BitSet follow)
>> > throws RecognitionException
>> > {
>> > throw new MismatchedTokenException(ttype, input);
>> > }
>> > public Object recoverFromMismatchedSet(IntStream input,
>> > RecognitionException e, BitSet follow)
>> > throws RecognitionException
>> > {
>> > throw e;
>> > }
>> > }
>> >
>> > @rulecatch {
>> > // throw exceptions rather than silently failing...
>> > catch (RecognitionException e) {
>> > throw e;
>> > }
>> > }
>> >
>> > start_rule
>> > : script
>> > ;
>> >
>> > script
>> > : statement*
>> > ;
>> >
>> > statement
>> > : expr terminator!
>> > ;
>> >
>> > expr
>> > : math_expr
>> > | assign_expr
>> > | meth_call_expr
>> > ;
>> >
>> > meth_call_expr
>> > : (IDENTIFIER DOT)? func_call_expr -> ^(METHOD_CALL IDENTIFIER?
>> > func_call_expr)
>> > | (STRING_LITERAL DOT)? func_call_expr -> ^(METHOD_CALL
>> > STRING_LITERAL?
>> > func_call_expr)
>> > ;
>> >
>> > fragment
>> > func_call_expr
>> > : IDENTIFIER^ argument_list
>> > ;
>> >
>> > fragment
>> > argument_list
>> > : LPAREN!? (expr (COMMA! expr)*)? RPAREN!?
>> > ;
>> >
>> > assign_expr
>> > : IDENTIFIER ASSIGN expr -> ^(ASSIGN IDENTIFIER expr)
>> > ;
>> >
>> > math_expr
>> > : mult_expr ((ADD^|SUB^) mult_expr)*
>> > ;
>> >
>> > mult_expr
>> > : pow_expr ((MUL^|DIV^|MOD^) pow_expr)*
>> > ;
>> >
>> > pow_expr
>> > : unary_expr ((POW^) unary_expr)*
>> > ;
>> >
>> > unary_expr
>> > : NOT? atom
>> > ;
>> >
>> > atom
>> > : literal
>> > | LPAREN! expr RPAREN!
>> > ;
>> >
>> > literal
>> > : HEX_LITERAL
>> > | DECIMAL_LITERAL
>> > | OCTAL_LITERAL
>> > | FLOATING_POINT_LITERAL
>> > // | REGEXP_LITERAL
>> > | STRING_LITERAL
>> > ;
>> >
>> > terminator
>> > : TERMINATOR
>> > | EOF
>> > ;
>> >
>> > POW : '^' ;
>> > MOD : '%' ;
>> > ADD : '+' ;
>> > SUB : '-' ;
>> > DIV : '/' ;
>> > MUL : '*' ;
>> > NOT : '!' ;
>> >
>> > ASSIGN
>> > : '='
>> > ;
>> >
>> > LPAREN
>> > : '('
>> > ;
>> >
>> > RPAREN
>> > : ')'
>> > ;
>> >
>> > COMMA
>> > : ','
>> > ;
>> >
>> > DOT : '.' ;
>> >
>> > CHARACTER_LITERAL
>> > : '\'' ( EscapeSequence | ~('\''|'\\') ) '\''
>> > ;
>> >
>> > STRING_LITERAL
>> > : '"' ( EscapeSequence | ~('\\'|'"') )* '"'
>> > ;
>> >
>> > /*
>> > REGEXP_LITERAL
>> > : '/' ( EscapeSequence | ~('\\'|'"') )* '/'
>> > ;
>> > */
>> >
>> > HEX_LITERAL : '0' ('x'|'X') HexDigit+ IntegerTypeSuffix? ;
>> >
>> > DECIMAL_LITERAL : ('0' | '1'..'9' '0'..'9'*) IntegerTypeSuffix? ;
>> >
>> > OCTAL_LITERAL : '0' ('0'..'7')+ IntegerTypeSuffix? ;
>> >
>> > fragment
>> > HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;
>> >
>> > fragment
>> > IntegerTypeSuffix
>> > : ('l'|'L')
>> > | ('u'|'U') ('l'|'L')?
>> > ;
>> >
>> > FLOATING_POINT_LITERAL
>> > : ('0'..'9')+ '.' ('0'..'9')* Exponent? FloatTypeSuffix?
>> > | '.' ('0'..'9')+ Exponent? FloatTypeSuffix?
>> > | ('0'..'9')+ Exponent? FloatTypeSuffix?
>> > ;
>> >
>> > fragment
>> > Exponent : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
>> >
>> > fragment
>> > FloatTypeSuffix : ('f'|'F'|'d'|'D') ;
>> >
>> > fragment
>> > EscapeSequence
>> > : '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\'|'/')
>> > | OctalEscape
>> > ;
>> >
>> > fragment
>> > OctalEscape
>> > : '\\' ('0'..'3') ('0'..'7') ('0'..'7')
>> > | '\\' ('0'..'7') ('0'..'7')
>> > | '\\' ('0'..'7')
>> > ;
>> >
>> > fragment
>> > UnicodeEscape
>> > : '\\' 'u' HexDigit HexDigit HexDigit HexDigit
>> > ;
>> > COMMENT
>> > : '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
>> > ;
>> >
>> > LINE_COMMENT
>> > : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
>> > ;
>> >
>> > IDENTIFIER
>> > : ID_LETTER (ID_LETTER|'0'..'9')*
>> > ;
>> >
>> > fragment
>> > ID_LETTER
>> > : '$'
>> > | 'A'..'Z'
>> > | 'a'..'z'
>> > | '_'
>> > ;
>> >
>> > TERMINATOR
>> > : '\r'? '\n'
>> > | ';'
>> > ;
>> >
>> > WS : (' '|'\r'|'\t'|'\u000C') {$channel=HIDDEN;}
>> > | '...' '\r'? '\n' {$channel=HIDDEN;}
>> > ;
>> >
>> > /* *************** END *************** */
>> >
>> > With this grammar, my tests so far pass, and I'm building trees for
>> > simple
>> > arithmetic operations and the like, including involving variables (e.g.
>> > a+1
>> > and the like), and method calls are working as I expect, including when
>> > passing method call results as args to another method call. But I cannot
>> > get
>> > input such as "a=b+(c=1)" to parse at all - Debugging in AntlrWorks
>> > shows me
>> > that the problem occurs when the parse sees the "b+", when it throws a
>> > NoViableAlt exception.
>> >
>> > I guessed this was because the parser doesn't see the identifier as an
>> > atom,
>> > so tries to parse it with the + symbol. So, I tried adding IDENTIFIER as
>> > an
>> > alternative to the atom rule - but that just broke the parser completely
>> > and
>> > many of my tests failed with an exception - MismatchedSetException.
>> >
>> > I've been playing with this for a few days now but no matter what I do,
>> > even
>> > when I get the type of syntax I mentioned above (the assign statement)
>> > working, I invariably break something (or more often, everything! :( )
>> > else.
>> > I'm really hoping someone out there will take pity on me and give me
>> > some
>> > insight into what I'm doing wrong.
>> >
>> > Thanks in advance!
>> > --
>> > Ross Bamford - roscoml at gmail.com
>> >
>> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> > Unsubscribe:
>> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>> >
>
>
More information about the antlr-interest
mailing list