Predicting Mutation Scores

Last week my MSc student, Kevin Jalbert, presented his early thesis results at the Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE 2012). The workshop took place in Zurich Switzerland and was colocated with ICSE 2012. The title of the presentation (and the paper that appears in the proceedings) was “Predicting Mutation Score Using Source Code and Test Suite Metrics.” The paper was awarded the Best Paper Award at the workshop.

Mutation testing can be used to evaluate the effectiveness of test suites and can also be used as an oracle during the creation or improvement of test suites. Mutation testing works by creating many versions of a program each with a single syntactic fault. These program versions are created using mutation operators which are based on an existing fault taxonomy (i.e., a set of known fault types that we are trying to find during testing). One mutation operator, Relational Operator Replacement (ROR), could create a new mutant version of the program in which one of the instance of a relational operator (e.g., <) is replaced with a different operator. For example, line 3 of the following Java source  code:

int x=1;
int y=0;
if (x > y) {
	//print x is bigger
	//...
} else {
	//print x is not bigger
	//...
}

could be mutated by the ROR operator to produce the mutant:

int x=1;
int y=0;
if (x < y) { // > changed to <
	//print x is bigger
	//...
} else {
	//print x is not bigger
	//...
}

This process is repeated for all instances of relational operators and can lead to a large number of mutants.  Once the program mutants have been generated, a test suite is evaluated against each one in order to determine the percentage of mutants a test suite is able to detect (kill) – this is known as the mutation score. A test is able to kill a mutant if it can tell the difference between the mutant and the original unmutated program (assumed to be correct). Mutation testing has been well researched for over 30 years and has been shown to be a effective as a test coverage measure and test oracle. Although mutation can be effective, a major challenge to adopting mutation testing in practice is the cost. A small program may yield thousands of mutants and be extremely costly. For example, in our research one method class of one program generated over 6000 mutants!

The research work presented in our RAISE 2012 paper is aimed at improving the utility of mutation testing by reducing the need to check all mutants of all units of code. In our paper we present a machine learning approach to predict mutation score based on a combination of source code and test suite metrics. We used the following tools to implement our approach:

To read how we used LIBSVM to predict mutation scores using a feature set gathered from the Eclipse Metrics plugin and the EMMA test coverage tool you can read our paper or review our presentation slides.

Leave a Reply

Your email address will not be published. Required fields are marked *