[antlr-interest] Non-determinism (was: Can I force a token to have precendence in the lexer?)

Bart Kiers bkiers at gmail.com
Tue Apr 20 23:21:43 PDT 2010


On Wed, Apr 21, 2010 at 2:41 AM, Andy Hull <andyh at sunrunhome.com> wrote:

> Wow, thanks for the article. I was able to redefine the language to avoid
> the problem in order to keep the parser as simple as possible (now using
> "to" instead of "..." ).
>
> My parser needs to be able to handle nested array expressions like so
>
> {1,2,{5 to 10}, {3,6,9}, 4}
>
> I have the following grammar:
>
> arrayExpression
> :   LEFT_BRACKET! arrayInitializer? RIGHT_BRACKET!;
> arrayInitializer
> :  (e+=expression (',' e+=expression)*)+ -> ^(ELEMENTLIST $e*)
> |  expression AUTO expression -> ^(AUTO expression expression)
> ;
>
> expression
> : arrayExpression
> /* | other types of expression */
> ;
>
> with the expected non-LL(*) grammar because "arrayInitializer" depends on
> the recursive rule expression. Setting backtrack to true doesn't resolve
> this as I expected.
>
> x={1,2,3,4};
>
> yields the correct tree but...
>
> x={1 to 3};
>
> yields the error:
>
> BR.recoverFromMismatchedToken
> line 1:5 mismatched input 'to' expecting RIGHT_BRACKET
>
> arrayInitializer behaves as expected when it contains only a single subrule
> (either the element list or the range initializer).
>
> Is backtracking the right solution to the non-determinism? I am doing
> something wrong?
>

How about something like this:

grammar Test;

parse
  : array ';' EOF
  ;

array
  :  '{' (arrayAtom (',' arrayAtom)*)? '}'
  ;

arrayAtom
  :  Number
  |  array
  |  range
  ;

range
  :  Number 'to' Number
  ;

Number
  :  '0'..'9'+
  ;

Space
  :  (' ' | '\t' | '\r' | '\n') {skip();}
  ;

Regards,

Bart.


More information about the antlr-interest mailing list