[antlr-interest] Having trouble with creating a parser for my desired grammar. Running afoul of multiple alternatives warnings
John B. Brodie
jbb at acm.org
Tue Nov 15 14:46:45 PST 2011
Greetings!
I think you have issues with your function, number, and ATOM rules. see
below...
I have attached my complete, modified, grammar that successfully parses
your input sample.
On 11/14/2011 11:47 PM, Jarrod Roberson wrote:
> I am trying to write a parser for the following syntax
>
> hypotenuse(a,b) ->
> sqr(x) -> x * x,
> sqr(sqr(b) + sqr(b)).
>
> print(hypotenuse(2,3)).
>
> Where , and . are my statement separator and statement eol respectively.
>
> I am having an impossible time trying to figure out how to specify the
> function rule to allow me to nest functions inside of other functions
> without running afoul of ambiguities warnings.
>
> 23:37:47] warning(200): funcy.g:10:11: Decision can match input such as
> "ID" using multiple alternatives: 1, 2
> As a result, alternative(s) 2 were disabled for that input
> [23:37:47] error(201): funcy.g:10:11: The following alternatives can never
> be matched: 2
>
>
> I really want to be able to use the above syntax without having to pepper
> the code with keywords like `func` or `var` etc.
>
> Here is my grammar, are there any ways to resolve these ambiguities with
> predicates of some sort that I haven't been able to figure out?
>
> I have read up on Google about them, but I can't get them to work with the
> parser rules to remove the ambiguities.
>
> grammar funcy;
>
> options {
> output = AST;
> language = Java;
> }
> program : (statement'.')* ;
just a nit pick here - you really should include EOF in your topmost rule.
>
> statement : expression
> | assignment
> ;
>
> assignment : ID '->' expression
> | ATOM '->' ( string | number )
> | function '->' statement ((','statement)=> ',' statement)* ;
I think you are being too liberal here with your function signatures.
you permit any expression to be a formal argument. are you intending to
have patterns akin to either ML or Haskell? if not, change the
definition of function in your assignment rule.
I also think that this permits multi-expression body, something like:
foo(a,b)-> a, b.
e.g. a function body consisting of two (or more) expressions. do you
really want that -- you do if your expressions can have side-effects.
maybe the third alt of assignment rule should be something like
(assuming you do not have side effects and watch out for i/o!):
| ID '(' ID (',' ID)* ')' '->' (assignment ',')* expression ;
this eliminates the need for a predicate.
>
> args : expression (',' expression)*;
>
> function : ID '(' args ')' ;
>
> string : UNICODE_STRING;
> number : HEX_NUMBER
> | (INTEGER '.' INTEGER)=> INTEGER '.' INTEGER
I do not think you want to recognize floating point values in the
parser. any tokens you send to the HIDDEN $channel (or skip();) will be
silently accepted before and after the '.' of the float. change your
INTEGER rule to this:
fragment FLOAT: ;
INTEGER : DIGIT+ ('.' DIGIT+ {$type=FLOAT;} )? ;
and use FLOAT in the number rule.
> | INTEGER;
>
> // expressions
>
> term : '(' expression ')'
> | number
> | string
> | function
> | ID
> | ATOM
> ;
>
> negation : '!'* term;
>
> unary : ('+'|'-')* negation;
>
> mult : unary (('*' | '/' | ('%'|'mod') ) unary)*;
>
> add : mult (('+' | '-') mult)*;
>
> relation : add (('=' | '!=' | '<' | '<=' | '>=' | '>') add)*;
> expression : relation (('&&' | '||') relation)*;
>
> // LEXER ================================================================
>
> HEX_NUMBER : '0x' HEX_DIGIT+;
>
> INTEGER : DIGIT+;
>
> UNICODE_STRING : '"' ( ESC | ~('\u0000'..'\u001f' | '\\' | '\"' ) )* '"'
> ;
>
> WS : (' '|'\n'|'\r'|'\t')+ {$channel = HIDDEN;} ; // ignore whitespace
>
> fragment
> ESC : '\\' ( UNI_ESC |'b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\' );
>
> fragment
> UNI_ESC : 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT;
>
> fragment
> HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ;
>
> fragment
> DIGIT : ('0'..'9');
>
> ATOM : (('A'..'Z'|'_')+)=> ('A'..'Z'|'0'..'9'|'_')+;
no need for a predicate
ATOM : ('A'..'Z')('A'..'Z'|'0'..'9'|'_')*;
note that this also removes the ambiguity as to whether the string "_"
is an ATOM or an ID.
>
> ID : ('a'..'z'|'_')('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
>
> COMMENT : '/*' .* '*/' {$channel = HIDDEN;};
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Test.g
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20111115/cd86aeaf/attachment.pl
More information about the antlr-interest
mailing list