[antlr-interest] Having trouble with creating a parser for my desired grammar. Running afoul of multiple alternatives warnings

Tue Nov 15 14:46:45 PST 2011

Greetings!

I think you have issues with your function, number, and ATOM rules. see
below...

I have attached my complete, modified, grammar that successfully parses
your input sample.

On 11/14/2011 11:47 PM, Jarrod Roberson wrote:
> I am trying to write a parser for the following syntax
> 
> hypotenuse(a,b) ->
>   sqr(x) -> x * x,
>   sqr(sqr(b) + sqr(b)).
> 
> print(hypotenuse(2,3)).
> 
> Where , and . are my statement separator and statement eol respectively.
> 
> I am having an impossible time trying to figure out how to specify the
> function rule to allow me to nest functions inside of other functions
> without running afoul of ambiguities warnings.
> 
> 23:37:47] warning(200): funcy.g:10:11: Decision can match input such as
> "ID" using multiple alternatives: 1, 2
> As a result, alternative(s) 2 were disabled for that input
> [23:37:47] error(201): funcy.g:10:11: The following alternatives can never
> be matched: 2
> 
> 
> I really want to be able to use the above syntax without having to pepper
> the code with keywords like `func` or `var` etc.
> 
> Here is my grammar, are there any ways to resolve these ambiguities with
> predicates of some sort that I haven't been able to figure out?
> 
> I have read up on Google about them, but I can't get them to work with the
> parser rules to remove the ambiguities.
> 
> grammar funcy;
> 
> options {
>     output = AST;
>     language = Java;
> }
> program : (statement'.')* ;

just a nit pick here - you really should include EOF in your topmost rule.

> 
> statement : expression
>           | assignment
>           ;
> 
> assignment : ID '->' expression
>            | ATOM '->' ( string | number )
>            | function '->' statement ((','statement)=> ',' statement)* ;

I think you are being too liberal here with your function signatures.
you permit any expression to be a formal argument. are you intending to
have patterns akin to either ML or Haskell? if not, change the
definition of function in your assignment rule.

I also think that this permits multi-expression body, something like:

foo(a,b)-> a, b.

e.g. a function body consisting of two (or more) expressions. do you
really want that -- you do if your expressions can have side-effects.

maybe the third alt of assignment rule should be something like
(assuming you do not have side effects and watch out for i/o!):

| ID '(' ID (',' ID)* ')' '->' (assignment ',')* expression ;

this eliminates the need for a predicate.

> 
> args : expression (',' expression)*;
> 
> function : ID '(' args ')' ;
> 
> string : UNICODE_STRING;
> number : HEX_NUMBER
>        | (INTEGER '.' INTEGER)=> INTEGER '.' INTEGER

I do not think you want to recognize floating point values in the
parser. any tokens you send to the HIDDEN $channel (or skip();) will be
silently accepted before and after the '.' of the float. change your
INTEGER rule to this:

fragment FLOAT: ;
INTEGER : DIGIT+ ('.' DIGIT+ {$type=FLOAT;} )? ;

and use FLOAT in the number rule.

>        | INTEGER;
> 
> // expressions
> 
> term : '(' expression ')'
>      | number
>      | string
>      | function
>      | ID
>      | ATOM
>      ;
> 
> negation : '!'* term;
> 
> unary : ('+'|'-')* negation;
> 
> mult : unary (('*' | '/' | ('%'|'mod') ) unary)*;
> 
> add : mult (('+' | '-') mult)*;
> 
> relation : add (('=' | '!=' | '<' | '<=' | '>=' | '>') add)*;
> expression : relation (('&&' | '||') relation)*;
> 
> // LEXER ================================================================
> 
> HEX_NUMBER : '0x' HEX_DIGIT+;
> 
> INTEGER : DIGIT+;
> 
> UNICODE_STRING : '"' ( ESC | ~('\u0000'..'\u001f' | '\\' | '\"' ) )* '"'
>                 ;
> 
> WS : (' '|'\n'|'\r'|'\t')+ {$channel = HIDDEN;} ; // ignore whitespace
> 
> fragment
> ESC : '\\' ( UNI_ESC |'b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\' );
> 
> fragment
> UNI_ESC : 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT;
> 
> fragment
> HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ;
> 
> fragment
> DIGIT : ('0'..'9');
> 
> ATOM : (('A'..'Z'|'_')+)=> ('A'..'Z'|'0'..'9'|'_')+;

no need for a predicate

ATOM : ('A'..'Z')('A'..'Z'|'0'..'9'|'_')*;

note that this also removes the ambiguity as to whether the string "_"
is an ATOM or an ID.

> 
> ID : ('a'..'z'|'_')('a'..'z'|'A'..'Z'|'0'..'9'|'_')*;
> 
> COMMENT : '/*' .* '*/' {$channel = HIDDEN;};
> 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Test.g
Url: http://www.antlr.org/pipermail/antlr-interest/attachments/20111115/cd86aeaf/attachment.pl