[antlr-interest] Testing ANTLR grammar as a whole.

Wed May 5 13:04:18 PDT 2004

Hi Mark,

Thanks for a wonderful reply. Its almost like you anticipated all my
questions and answered everything in one large email. I really appreciate
this help.

I am thinking of creating a top-level rule that would closely resemble
script-input: statement | declaration | query ;

Thanks again,

Bharath.

-----Original Message-----
From: Mark Lentczner [mailto:markl at glyphic.com] 
Sent: Wednesday, May 05, 2004 2:36 PM
To: antlr-interest at yahoogroups.com
Subject: Re: [antlr-interest] Testing ANTLR grammar as a whole.

Bharath -

I'm assuming that your language has several top-level rules.  For  
example:

statement: ... ;
declaration: ... ;
query: ... ;

When a script is parsed, you expect it to match one of these.  In which  
case, you would be best served by having a single top-rule above these:

script-input: statement | declaration | query ;

This will serve to help the testing function, but also allow Antlr to  
check and point out any ambiguities between these different types of  
input.

Now, if you aim is to have a single large file with a sequence of  
scripts in it so that you can check that they all parse, then one  
approach is:

test_input: (script_unit)* ;
script_input: script_unit ;
script_unit: statement | declaration | query ;

The reason for the two top rules to call the now intermediate rule  
script_unit is so that an EOF will be correctly required on  
script-input, but not between script_units.  There is a possible  
problem with this: If your grammar relies on the EOF to know where, for  
example, a statement ends, then this won't work for test_input.  You'll  
need to defined a special, for testing only token:

SCRIPT_DIVIDER: "---EOF---" NL ;

and then have

test_input: (script_unit SCRIPT_DIVIDER)* ;
script_input: script_unit ;
script_unit: statement | declaration | query ;

In the end, though, I don't advocate this kind of testing:  A single  
monolithic file would only serve to tell you that it parses, not that  
it parses correctly.  And if it doesn't parse it would be harder to see  
what is wrong.  Lastly, it isn't automatic.

My parser tests run along these lines:

void test_binary_op()
{
     antlr::RefAST result = compile("{ a + b; }");
     string resultStr = printAST(result);
     CHECK_SAME(" (BLOCK (BINARY ID<+> (LOCAL ID<a>) (LOCAL ID<b>)))",  
resultStr);
}

My current project's grammar has about 160 of these style tests.  They  
run at startup every time I run the project.  When I break even the  
slightest piece of the compiler - I know it immediately, and I know  
exactly where.

While my grammar does have only one top level rule, I did define one  
additional top-level rule for testing so that I cold test fragments of  
scripts without having to write all the declarations needed to get to  
that point in the grammar.

You can see the actual test suite at:
	http://cvs.sourceforge.net/viewcvs.py/wheat/r1/grammar/ScriptParser-

test.cpp?view=markup

	- Mark

Mark Lentczner
markl at wheatfarm.org
http://www.wheatfarm.org/

Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/