[antlr-interest] automated randomized parser testing

Terence Parr parrt at cs.usfca.edu
Tue Oct 4 17:15:21 PDT 2005


Hi.  You may recall that I was playing around with generating random  
sentences from grammars for testing purposes.  Well, I finished a  
little more of that in preparation for the ANTLR2005 workshop.  Reto  
Kramer, who you may know from his iContract tool, asked me about  
pounding servers that listen for a specific protocol; i.e., what  
automation can be done to hit parsers, interpreters, and translators  
with random but syntactically correct sentences?

Here is an example.  I give it a grammar called simple.g and a  
starting rule, then let it rip!

/tmp $ java org.antlr.tool.RandomPhrase simple.g program
int H = 873 ';' method j '(' ')' '{' int a ';' int b ';' return a ';'  
'}'
/tmp $ java org.antlr.tool.RandomPhrase simple.g program
method SD '(' ')' '{' int Ta = 3 ';' int F ';' return 0 ';' '}'  
method SaE '(' ')' '{' int L ';' int BdT ';' return x ';' x = Fg ';'  
CDSCO = x ';' '}'
/tmp $ java org.antlr.tool.RandomPhrase simple.g program
int Ktcdn ';' method wh '(' ')' '{' return 5 ';' '}'
/tmp $ java org.antlr.tool.RandomPhrase simple.g program
method k '(' ')' '{' z = BqehVnH ';' '}' method r '(' ')' '{' g = 32  
';' '}' method c '(' ')' '{' int X ';' return Gs ';' vm = 134 ';' '}'
...

Notice that it first gets a random phrase of token types and then  
asks the lexer repeatedly for a random token of each type.

The grammar is

grammar SimpleParser;
options {output=AST;}
program : variable* method+
         ;
variable: INTTYPE ID (ASSIGN expr)? ';' -> ^(INTTYPE ID ^(ASSIGN expr)?)
         ;
method  : METHOD ID '(' ')'
           '{'
               variable* statement+
           '}'
           -> ^(METHOD ID variable* statement+)
         ;
statement
         : ID ASSIGN expr ';' -> ^(ASSIGN ID expr)
         | RETURN expr ';'    -> ^(RETURN expr)
         ;
expr    : ID | INT
         ;
ASSIGN  : '=' ;
RETURN  : "return";
INTTYPE : "int";
METHOD  : "method";
ID      : ('a'..'z'|'A'..'Z')+ ;
INT     : ('0'..'9')+ ;
WS      : (' '|'\t'|'\n')+ {channel=99;}
         ;

Pretty useful for when you have actions.  You can just let it run  
overnight and see if you can make your system crash.

This is all done in an interpreted manner; no code gen or anything.

Cool, eh?

Ter


More information about the antlr-interest mailing list