[antlr-interest] Parsing poorly terminated IF statements

Dean Shumsheruddin DShumsheruddin at rocketsoftware.com
Fri Mar 11 06:32:51 PST 2011


Hi Folks,

I'm using ANTLR to parse an old Fortran-like language with poorly terminated if statements.
Here is a simplified version of a block in the language. I've just added the indentation to show the structure:

print 1
if
  print 2
else
  print 3
endif
if
  print 4
if
  print 5
  do
    print 6
    if
      print 7
  next
  print 8

If constructs may be terminated by 'endif', a new 'if' construct, or the end of the current block.
If statements cannot be nested except via a do-next construct. Every 'do' is terminated by matching 'next'.

Here is a simplified version of the grammar I'm using:

block     : command* ;

command            : (ifcom)=> ifcom
               | print
               | docom
               ;

ifcom   : IF NL noifblock (ELSE NL noifblock)? (ENDIF NL)? ;

noifblock : noifcommand* ;

noifcommand : print | docom ;

print   : PRINT NL ;

docom   : DO NL block NEXT NL ;

// Lexer Rules
IF      : 'if'    ;
ELSE    : 'else'    ;
ENDIF   : 'endif'    ;
PRINT   : 'print'    ;
DO      : 'do'    ;
NEXT    : 'next'    ;
INT         : '0'..'9'+               ;
NL          : '\n'       ;
WS      : (' ' |'\n' |'\r' )+ {skip();}     ;


It generates warnings:

[14:15:38] warning(200): if.g:37:4: Decision can match input such as "DO" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
[14:15:38] warning(200): if.g:37:4: Decision can match input such as "PRINT" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input

but seems to work because of the greedy ifcom rule.  Can anyone suggest a better way of doing it?

Thanks for your help.

Dean






More information about the antlr-interest mailing list