[antlr-interest] gUnit suggestion: treat Lexer and Parser errors the same

Mon Nov 10 13:57:27 PST 2008

I have experimented with gUnit a little and I think it has real
possibilities. I would find it more useful if lexer failures were treated
the same as parser failures.

For instance, gUnit does not allow me to specify a Lexer test, e.g.,
  ID: "93XXX" FAIL

Also, as long as no lexer errors are involved, gUnit treats parser errors
much as I expect a unit tester to do. That is, a very simple message is
given when all tests are successful, and only failing tests are reported.
However all lexer errors are reported, even when the test is marked as FAIL.
This means that if I test that a lexer handles invalid tokens correctly, I
have to scan the "invalid input" messages to make sure that it happened at
the point I expected the failure.

Even weirder, a parser test with an OK that fails because of a lexer failure
is not counted as a test failure, although it is listed as an "invalid
input".

This limits gUnit to use where no lexer mistakes are made. If I adopt a
strategy of assuming that my lexer works, then when I get any message it
means that something went wrong. But note that there may be false negatives:
if something goes wrong because my lexer collected characters into an
unexpected and goofy set of tokens, my tests may FAIL or be OK for a
different reason than I think I am testing, leaving my worries about the
original problem untested. 

Here is a sample testsuite:
------------- file follows this line ----------------
/**Expr.testsuite - tests for the Expr grammar of ANTLR Ref, Ch. 3
 *  (really this is to test the use of gunit)
 */

//   why not test Lexer as well as Parser? e.g.,
//   ID: "93XXX" FAIL 

gunit Expr;

stat
: "99 = a" FAIL     // --> Marked as FAIL, 
                    //     so should not give INVALID INPUT message <--
  "99 = a" OK       // --> INVALID INPUT, BUT SHOULD ALSO BE FAILURE <--

  <<a = 99>> FAIL   // because newline required

expr
: "12*a + B * 93XXX"  OK     // --> Marked as OK, so should give FAILURE msg
                             //     (as well as INVALID INPUT?)
<--
  "5+ a*Z"         OK
  "5+ a - b*c*22"  OK
  "+21"            FAIL
  "-12"            FAIL  
  "a - -3"         FAIL
  "b++"            FAIL
  "5-(3-(4-6-2))"  OK
  "5-(a)-()"       FAIL

multExpr
: "a*3" OK
  "4" OK
  "B * 93XXX" FAIL           // --> Marked as FAIL, 
                             //     so should not give INVALID INPUT message
<--
  "B" OK
  "2*3" OK
  "(2*4)" OK

atom
: "93XXX" FAIL               // --> Marked as FAIL, 
                             //     so should not give INVALID INPUT message
<--
  "KA"      OK 
  "ka"      OK   

  "93"           OK
  "9 "           OK
  "  ( 92doo ) " FAIL
  "  ( 92    ) " OK
  "  ( doo   ) " OK
  "  (92 "       FAIL
  "()"           FAIL
----------- end of file -------------

And here is the output:
========== file follows this line ============
-----------------------------------------------------------------------
executing testsuite for grammar:Expr with 28 tests
-----------------------------------------------------------------------
0 failures found:
5 invalid inputs found:
test1 (stat, line12) - 
invalid input: 99 = a
test2 (stat, line14) - 
invalid input: 99 = a
test4 (expr, line19) - 
invalid input: 12*a + B * 93XXX
test15 (multExpr, line33) - 
invalid input: B * 93XXX
test19 (atom, line40) - 
invalid input: 93XXX

Tests run: 28, Failures: 0
=========== end of file ============

George

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20081110/f8fcfa03/attachment.html