[antlr-interest] Syntax based testing for ANTLR grammars?

Sat Sep 22 16:58:13 PDT 2012

Hi Michael,

In case you haven't found it, if you search antlr.mailmark.org for "random
phrase" you will find:
http://antlr.markmail.org/search/?q=random%20phrase#query:random%20phrase+page:1+mid:z3jvawmd46xebel7+state:results

Ter gave a reference to Ralf Lämmel.

http://softlang.wikidot.com/rlaemmel:home

He list some papers related to test case generation from a grammar. I did
not find the papers I originally read from a year ago but if I remember
correctly, he also used a seed.

I mention the seed because in my proof of concept I did not use a seed as I
wanted no randomness in the test cases. Even if you can test 10**7 ( ten
million) cases and you can show that there are 10**30 basic cases you still
haven't put much of a dent into the test cases.

In all of the tools I have seen they always use a seed.

As food for thought you might want to see if that is true about the seeds
and create a generator not using a seed.

But then again, you could go for the seed and I could get some seed money
and start a company. :)

Eric

On Sat, Sep 22, 2012 at 7:27 PM, Michael Barry <msb5014 at gmail.com> wrote:

> Thanks for the feedback!  The random phrase generator provided a good
> base to get started, but to make it work reliably, I had to change the
> version in the 3.4 distribution to 1) not generate identifiers that
> conflicted with keywords, and 2) limit its internal stack size so that
> if it gets too big it just gives up and starts over.  Once I did that
> it started generating ~300,000 random syntactically valid productions
> per minute, and it has been generating some interesting findings :-)
>
> I am working on my Masters right now, and I think that this may be a
> good topic for thesis.
>
> Thanks
> Mike Barry
>
> On Sep 20, 2012, at 12:39 PM, Eric <researcher0x00 at gmail.com> wrote:
>
> > Hi Mike,
> >
> > "What I wonder is if this way test is really meaningful. I mean, when you
> > create test sentences from the grammar they will always work, no? "
> >
> > Yes the test are meaningful but only meaningful if you don't blindly
> > generate phrases and accept the output without verification.. Since you
> > don't want to hand validate each result you make the generator smart
> enough
> > and give it semantic knowledge to tag the results as this should be valid
> > and that should be invalid, then look for a negative result. i.e. a test
> > that should pass that fails or a test that should fail that passes. You
> > should still hand check a random set of results. Note that even with all
> > this testing it does not guarantee the app using the ANTLR is correct, it
> > only helps to increase the level of confidence, unless of course the
> > grammar is so simple that you can exhaustively test all of the cases.
> Also
> > remember that if the generator is smart enough it can focus on a
> particular
> > portion of the grammar and run many pivot examples off that portion which
> > just might be a complete set of test cases.
> >
> > "Since your parser uses the same grammar as the test input stems from
> there
> > can only be test failures if your grammar uses actions to control what
> gets
> > parsed how. But for a pure grammar you can only generate test cases when
> > you take your language spec (not codified in a grammar) and create test
> > cases out of it."
> >
> > When I created my proof of concept tool I did give it the ability to
> > generate both positive and negative results. I was also using C++ and
> from
> > the C++ reference grammar  I created one grammar for input to ANTLR and
> one
> > grammar for input into my phrase generator. I then used the phrases from
> > the generator as input into the lexer/parser generated by ANTLR and then
> > looked at the results.
> >
> > There is a lot more to this than I explain here but I personally found
> the
> > proof of concept tool to be very educational. Many results that came out
> > for valid were unlike any C++ code I have ever seen from a human but they
> > did compile. The reason is the generator does not think like a human, it
> > just does a tree walk based on the rules and generates phrases. Where I
> > left off was creating another layer to take common human type test
> > conditions and turn them into the rules to be used by the generator.
> >
> > After the proof of concept tool worked I moved onto another project
> because
> > what I really wanted out of the tool was what was learned in doing it
> more
> > than the tool itself. The next steps of converting human conditions into
> > rules and creating a GUI front end would have required more work than I
> was
> > interested in doing.
> >
> > Eric
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Sep 20, 2012 at 11:53 AM, Mike Lischke <mike at lischke-online.de
> >wrote:
> >
> >>
> >> What I wonder is if this way test is really meaningful. I mean, when you
> >> create test sentences from the grammar they will always work, no? Since
> >> your parser uses the same grammar as the test input stems from there can
> >> only be test failures if your grammar uses actions to control what gets
> >> parsed how. But for a pure grammar you can only generate test cases when
> >> you take your language spec (not codified in a grammar) and create test
> >> cases out of it.
> >>
> >>> I did some work on my own on this a year ago and the problem you get
> into
> >>> is a huge explosion in output. Just to manage the problem of the
> >> explosion
> >>> I had to create methods just to count the possibilities given a grammar
> >> and
> >>> the easiest way for me to relate to the size was as a power. So for one
> >>> grammar the set could have 10**15 results and others could have 10**30
> >>> results. This is not even something that can be reasonably tested.
> >>>
> >>>>
> >>>> Is anyone aware of any projects out there for doing syntax-based
> >>>> testing using ANTLR grammars?  The simplest form would be a tool for
> >>>> generating valid language productions from an ANTLR grammar, but more
> >>>> complex might introduce mutations.
> >>>>
> >>>>
> >>>> Thanks
> >>>> Mike Barry
> >>>>
> >>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>>> Unsubscribe:
> >>>>
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >>>>
> >>>
> >>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >>> Unsubscribe:
> >> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> >>
> >> Mike
> >> --
> >> www.soft-gems.net
> >>
> >>
> >>
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>