[antlr-interest] SUGGESTION: Make unit testing available for Composite Grammars

Thu Feb 19 17:44:07 PST 2009

On Feb 19, 2009, at 1:11 PM, George S. Cowan wrote:

> ANTLR's new Composite Grammar facility presents some difficulties  
> when trying to use good unit testing practices, and I hope that  
> these difficulties can be addressed in an upcoming release. The  
> problems apply to both gUnit and JUnit.
>
> In both of the cases that follow, the general problem is that a test  
> must include code that references the level at which a rule is  
> defined - thus a test will break when a fix is implemented by  
> overriding the rule at a new level. A test will also break when a  
> change is made to a rule by overriding it at a new level, even if  
> the change does not affect that particular test.
>
> No matter how hard it may be to change ANTLR and its templates, at  
> least the fix for the problems is simple to state: "ANTLR must  
> delegate all rule access from the top level in a way that external  
> callers are not required to know about which level the rule is  
> delegated to."

That was my intent anyway ;)  Top level is only reliable object to  
test I think.

> First, an easy problem. The lexers generated by ANTLR 3.1.1 do not  
> delegate their rules from the top level at all.

  i think we have  number of errorsRelated to this; for example

http://www.antlr.org/jira/browse/ANTLR-353
http://www.antlr.org/jira/browse/ANTLR-386

> For a workaround, see http://www.antlr.org/pipermail/antlr-interest/2009-January/032201.html
> My impression is that this Lexer problem can be easily solved by  
> having ANTLR add the delegating rules to the main lexer, but my  
> experience is only with very straightforward lexers.

Well, the issue I see is that I don't think I intended people to call  
individual lexer rules, even from the root grammar. I expect all  
lexical stuff to go through the implicit Tokens rule invoke from  
nextToken

> THE BIGGER PROBLEM
>
> AST results generated from a rule have a return type that depends  
> upon the subgrammar in which the rule is defined. Here is an example  
> of a delegation from the a top level grammar C that might be  
> generated in its parser CParser.java:
>
>   public C_P2_P1.spaces_return spaces() throws RecognitionException {
>     return gP1.spaces();
>   }
>
> (For those not already familiar with this code: The spaces()  
> function returns a value of type "C_P2_P1.spaces_return", which is a  
> a subclass defined in the class "C_P2_P1"  generated by ANTLR for  
> the grammar P1 which is imported by P2 which is imported by the top  
> level grammar, C. This "spaces_return" subclass contains, among  
> other things, the tree that must be accessed by a test of the rule  
> for "spaces".)
>
> Therefore, the testing program must know that the rule is delegated  
> to the P1 subgrammar in order to know  the return type of the rule.  
> If the spaces rule is overridden in the C grammar, the return value  
> will become "CParser.spaces_return" and any tests that assume  
> "C_P2_P1.spaces_return" will be broken, even if the issue they are  
> testing still works.

Can we use the generic ParserRuleReturnScope object?  Hmm...that off  
work because we might have special elements in there that we need to  
access. crap.

Again, this might be a problem because we are trying to access these  
objects directly.

> Note that the problem applies not just with tests of AST trees but  
> with tests of any value returned from a rule, which includes string  
> template results.

yup. :(

> SUGGESTED SOLUTIONS
>
> One possible solution is to define all the rule_return subclasses in  
> the generated parser for the top level, no matter what level the  
> rule is from, e.g., "CParser.spaces_return". Since all the generated  
> subgrammar parsers know about the top level parser, this should not  
> cause any problems.
>
> An alternative is for every rule to return a Map or an array of  
> maplets. For ordinary programming, I don't like this as much because  
> its generality introduces too many opportunities for errors. In  
> generated code, it's more controlled and simplifies ANTLR's  
> situation where there must be many different kinds of return  
> collections. It also gets rid of all those subclass files in the  
> class directory, like "C_P2_P1$spaces_return.class". On the negative  
> side, I expect it would use a little more space and time for each  
> return value.
>
> A compromise would be to define a single class that had the usual  
> start, stop, tree, and/or st values along with a map to hold extra  
> return values when needed.
>
> The key issue is to maintain the unit test pattern while allowing  
> all the new patterns that Composite grammars introduce.

Hmm...would it be possible to test everything you need to test going  
through the root grammar rather than directly to the delegates?

Ter