[antlr-interest] Grammar puzzle....

Thu Jul 12 06:59:57 PDT 2007

I made some progress on this problem:
apparently in this rule:
========
identifier: ( xaml=ID COLON )? id0=ID
  ( DOT id+=ID )*
  -> ^( ID[$id0] ^( XAMLNS[$xaml] ) ^( ID[$id0] $id0 /*$id+*/ )  )
 ;
========
the fact that "(xaml=ID COLON)?" is optional (?) but that I used it all the time in the tree generation *might be* the cause of the trouble.

I guess I could *solve* the problem with 
  -> ^( ID[$id0] ^( XAMLNS[$xaml] )?  ^( ID[$id0] $id0 /*$id+*/ )  )
instead of
  -> ^( ID[$id0] ^( XAMLNS[$xaml] ) ^( ID[$id0] $id0 /*$id+*/ )  )

but that's no good, for my generated tree to be easy to use I always want the 2nd member to be the XAMLNS node, even though mpty.
How could I rewrite my rule so that the tree always contains such a node, wether or not it has been found?

  ----- Original Message ----- 
  From: Lloyd Dupont 
  To: antlr-interest at antlr.org 
  Sent: Thursday, July 12, 2007 11:13 PM
  Subject: [antlr-interest] Grammar puzzle....

  I try to parse the following input:
  ===========
  3 & 4 + a is c + 4 & 3
  ===========

  with the following grammar:
  ===========
  grammar TreeTest;

  options {output=AST;}
  tokens
  {
   IS='is';
   XAMLNS;
  }

  expression: logical ;

  logical : compare (LOR^ | LAND^ compare)* ;

  compare : (additive -> additive)
    (
     ( op=(LT | GT) s=additive -> ^($op $compare $s) )
    | is='is' i=additive -> ^(IS[$is] $compare $i)
    )?
    ;

  additive: multiple ((PLUS^ | MINUS^) multiple)* ;

  multiple: atom ((MULT^ | DIV^) atom)* ;

  atom :  identifier | INT;

  identifier 
   : ( xaml=ID COLON )? id0=ID
    ( DOT id+=ID )*
    -> ^( ID[$id0] ^( XAMLNS[$xaml] ) ^( ID[$id0] $id0 $id+ )  )
   ;

  ID : 'a'..'z' + ;
  INT : '0'..'9' +;
  PLUS :  '+';
  MINUS : '-';
  MULT :  '*';
  DIV : '/';
  LAND : '&';
  LOR : '|';
  LT : '<';
  GT : '>';
  DOT : '.';
  COMMA : ',';
  COLON : ':';
  WS : (' ' |'\n' |'\r' ) {$channel=HIDDEN;} ;
  ===========

  The parsing stop just before 'is', i.e. I can only parse "3 & 4 + a"
  I can't understand why.

  What seems even more mysterious to me is, if I simplify my 'identifier' rule to be like that:
  ===========
  identifier: ID;
  ===========

  I could parse all my input.

  For the life of me I can't understand why the previous syntax for the 'identifier' rule  prevent 'is' to be parsed....

  Any tip?!?!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070712/6bd87f74/attachment-0001.html