[antlr-interest] Parser testing

Thu Oct 11 05:48:13 PDT 2007

All,

    I'm a relative newbie to parsing in general, and Antlr
specifically.  I'm a bit fan of test driven development, so I'd like
to write Unit tests.  I noticed that the front page has an article
about gUnit... While interesting, not exactly what I want for now.  I
haven't download it's source to see if and how they handled this.
Maybe that's what I should have done.

    My interest is in finding a way to have a grammar that has nice
error handling, but still be able to find an error.  I've done
something like the following to my grammer:

   private boolean recognitionFailed = false;

   public boolean getRecognitionFailed() {
        return recognitionFailed;
   }

   public void resetRecognitionFailed() {
        recognitionFailed = false;
   }

   protected void mismatch(IntStream input, int ttype, BitSet follow)
       throws RecognitionException
   {
      recognitionFailed = true;
      super.mismatch(input, ttype, follow);
   }

What I'm looking for is the easiest way to do something like this:

//  ... Setup a parser, and lexer to test input.
parser.declaration();
// Check to ensure that no error handling or recovery happened.  (It'd
also be nice to ensure that all of the input was consumed, but I think
that's doable).

For simple inputs that are attempting to parse C-like code with a
grammer like below:

program :
	declaration* ;

declaration
	: TYPE IDENTIFIER ';';

identifier : IDENTIFIER ;

TYPE
	: 'void'
	| 'int'
	| 'float'
	| 'double'
	| 'short'
	| 'long'
	| 'char'
	| 'signed'
	| 'unsigned'
	;

IDENTIFIER
	:  ('A'..'Z' | 'a'..'z' | '_')  ('A'..'Z' | 'a'..'z' | '_' | '0'..'9')* ;

WS
	: (
	  ' '
	| '\t'
	| '\n'
	| '\r' )+  {channel=99;};

If you pass in "0bar", the lexer finds the 0, prints an error, and
then proceeds to generate token for "bar", and then says that it is in
fact an "IDENTIFIER".  This will get fixed once I train th lexer to
understand numbers, but it also exhibits my problem fairly quickly.

Any help on dealing with this would be much appreciated.  I figured it
should be trivial to use the parser like  a regular expression, and
call the equivalent of Java "Regex.match()", but alas, that appears to
be more difficult then I think it should be.  I'm hoping I'm just
missing something blazingly obvious.

Thanks in advance,
      Kirby