`Name:`

Supervised classification algorithms `Domain:`

Machine learning`Functionality:`

k-nearest neighbors and Naïve Bayes Classifier.`Input:`

The testing data input is an unlabeled test case $t_s$ `Output:`

The aim is to determine its class label $c_t$
Testing and validating machine learning classifiers by metamorphic testing https://doi.org/10.1016/j.jss.2010.11.920;

A set of training data that can be represented by two vectors of size $k$.One vector is for the $k$ training samples$S=\langle s_0,s_1,\ldots ,s_{k-1} \rangle$ and the other is for the corresponding class labels$C=\langle c_0,c_1,\ldots ,c_{k-1}\rangle$. Each sample $s\in S$ is a vector of size $m$, which represents $m$ attributes from which to learn. Each label $c_i$ in $C$ is an element of a finite set of class labels, that is, $c_i \in L=\{l_0,l_1,\ldots ,l_{n-1}\}$, where $n$ is the number of possible class labels.
For a training sample set $S$, suppose each sample has $m$ attributes,$\langle att_0,att_1,\ldots,att_{m-1}\rangle$,and there are $n$ classes in $S$,$ \{l_0,l_1, \ldots,l_{n-1}\}$.The value of the test case $t_s$ is $\langle a_0,a_1,\ldots,a_{m-1}\rangle$.

`Description:`

`Property:`

The result should be the same if we apply the same arbitrary affine transformation function, $f(x)=kx+b, (k \neq 0)$ to the values of any subset of attributes for each sample in the training data set $S$ and the test case $t_s$. `Source input:`

`Source output:`

`Follow-up input:`

`Follow-up output:`

`Input relation:`

`Output relation:`

`Pattern:`

`Description:`

`Property:`

Assume that we have a class-label permutation function $Perm ()$ to perform one-to-one mapping between a class label in the set of labels $L$ to another label in $L$.If the source case result is $l_i$ ,applying the permutation function to the set of corresponding class labels $C$ for the follow-up case, the result of the follow-up case should be $Perm(l_i)$. `Source input:`

`Source output:`

`Follow-up input:`

`Follow-up output:`

`Input relation:`

`Output relation:`

`Pattern:`

`Description:`

`Property:`

If we permute the $m$ attributes of all the samples and the test data, the output should remain unchanged. `Source input:`

`Source output:`

`Follow-up input:`

`Follow-up output:`

`Input relation:`

`Output relation:`

`Pattern:`

`Description:`

`Property:`

An uninformative attribute is one that is equally associated with each class label. For the source input,suppose we get the result $c_t=l_i$ for the test case $t_s$ .In the follow-up input,we add an uninformative attribute to each sample in $S$ and respectively a new attribute in $t_s$ . The choice of the actual value to be added here is not important as this attribute is equally associated with the class labels. The output of the follow-up test case should still be $l_i$ . `Source input:`

`Source output:`

`Follow-up input:`

`Follow-up output:`

`Input relation:`

`Output relation:`

`Pattern:`

`Description:`

`Property:`

For the source input, suppose we get the result $c_t=l_i$ for the test case $t_s$ . In the follow-up input, we add an informative attribute to each sample in $S$ and $t_s$ such that this attribute is strongly associated with class $l_i$ and equally associated with all other classes. The output of the follow-up test case should still be $l_i$ . `Source input:`

`Source output:`

`Follow-up input:`

`Follow-up output:`

`Input relation:`

`Output relation:`

`Pattern:`

`Description:`

`Property:`

For the source input, suppose we get the result $c_t=l_i$ for the test case $t_s$ . In the follow-up input, we can append $t_s$ and $c_t$ to the end of $S$ and $C$ respectively. We call the new training dataset $S'$ and $C'$ . We take $S'$ , $C'$ and $t_s$ as the input of the follow-up case, and the output should still be $l_i$. `Source input:`

`Source output:`

`Follow-up input:`

`Follow-up output:`

`Input relation:`

`Output relation:`

`Pattern:`

`Description:`

`Property:`

For the source input, suppose we get the result $c_t=l_i$ for the test case $t_s$. In the follow-up input, we duplicate all samples in $S$ with label $l_i$ , as well as their associated labels in $C$. The output of the follow-up test case should still be $l_i$ . `Source input:`

`Source output:`

`Follow-up input:`

`Follow-up output:`

`Input relation:`

`Output relation:`

`Pattern:`

`Description:`

`Property:`

For the source input, suppose we get the result $c_t =l_i$ for the test case $t_s$. In the follow-up input, we duplicate all samples in $S$ and $C$ that do not have label $l_i$ and concatenate an arbitrary symbol “*” to the class labels of the duplicated samples. That is, if the original training sample set $S$ is associated with class labels $\langle A, B, C\rangle$ and $l_i$ is $A$, the set of classes in $S$ in the follow-up input could be $\langle A, B, C, B*, C*\rangle$. The output of the follow-up test case should still be $l_i$. Another derivative of this metamorphic relation is that duplicating all samples from any number of classes which do not have label $l_i$ should not change the result of the output of the follow-up test case. `Source input:`

`Source output:`

`Follow-up input:`

`Follow-up output:`

`Input relation:`

`Output relation:`

`Pattern:`

`Description:`

`Property:`

For the source input, suppose we get the result $c_t=l_i$ for the test case $t_s$. In the follow-up input, we re-label some of the samples in $S$ with labels other than $l_i$ , through concatenating an arbitrary symbol “*” to their associated class labels in $C$. That is, if the original training set $S$ is associated with class labels $\langle A, B, B, B, C, C, C\rangle$ and $c_0$ is $A$, the set of classes in $S$ in the follow-up input may become $\langle A, B, B, B*, C, C*, C*\rangle$. The output of the follow-up test case should still be $l_i$. `Source input:`

`Source output:`

`Follow-up input:`

`Follow-up output:`

`Input relation:`

`Output relation:`

`Pattern:`

`Description:`

`Property:`

For the source input, suppose we get the result $c_t =l_i$ for the test case $t_s$. In the follow-up input, we remove one entire class of samples in $S$ of which the label is not $l_i$. That is, if the original training sample set $S$ is associated with class labels $\langle A, A, B, B, C, C\rangle$ and $l_i$ is $A$, the set of classes in $S$ in the follow-up input may become $\langle A, A, B, B\rangle$. The output of the follow-up test case should still be $l_i$. `Source input:`

`Source output:`

`Follow-up input:`

`Follow-up output:`

`Input relation:`

`Output relation:`

`Pattern:`

`Description:`

`Property:`

For the source input, suppose we get the result $c_t =l_i$ for the test case $t_s$. In the follow-up input, we remove part of some of the samples in $S$ and $C$ of which the label is not $l_i$ . That is, if the original training set $S$ is associated with class labels $\langle A, A, B, B, C, C\rangle$ and $l_i$ is $A$, the set of classes in $S$ in the follow-up input may become $\langle A, A, B, C\rangle$. The output of the follow-up test case should still be $l_i$. `Source input:`

`Source output:`

`Follow-up input:`

`Follow-up output:`

`Input relation:`

`Output relation:`

`Pattern:`