Supervised classification algorithms

### Program Information

Name: Supervised classification algorithms
Domain: Machine learning
Functionality: k-nearest neighbors and Naïve Bayes Classifier.
Input: The testing data input is an unlabeled test case $t_s$
Output: The aim is to determine its class label $c_t$

#### Reference

 Testing and validating machine learning classifiers by metamorphic testing https://doi.org/10.1016/j.jss.2010.11.920;

### MR Information

A set of training data that can be represented by two vectors of size $k$.One vector is for the $k$ training samples$S=\langle s_0,s_1,\ldots ,s_{k-1} \rangle$ and the other is for the corresponding class labels$C=\langle c_0,c_1,\ldots ,c_{k-1}\rangle$. Each sample $s\in S$ is a vector of size $m$, which represents $m$ attributes from which to learn.  Each label $c_i$ in $C$ is an element of a finite set of class labels, that is, $c_i \in L=\{l_0,l_1,\ldots ,l_{n-1}\}$, where $n$ is the number of possible class labels. For a training sample set $S$, suppose each sample has $m$ attributes,$\langle att_0,att_1,\ldots,att_{m-1}\rangle$,and there are $n$ classes in $S$,$\{l_0,l_1, \ldots,l_{n-1}\}$.The value of the test case $t_s$ is $\langle a_0,a_1,\ldots,a_{m-1}\rangle$.

#### MR1------ Consistence with affine transformation

Description:
Property: The result should be the same if we apply the same arbitrary affine transformation function, $f(x)=kx+b, (k \neq 0)$ to the values of any subset of attributes for each sample in the training data set $S$ and the test case $t_s$.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR2------ Permutation of class labels

Description:
Property: Assume that we have a class-label permutation function $Perm ()$ to perform one-to-one mapping between a class label in the set of labels $L$ to another label in $L$.If the source case result is $l_i$ ,applying the permutation function to the set of corresponding class labels $C$ for the follow-up case, the result of the follow-up case should be $Perm(l_i)$.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR3------ Permutation of the attribute

Description:
Property: If we permute the $m$ attributes of all the samples and the test data, the output should remain unchanged.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR4------ Addition of uninformative attributes

Description:
Property: An uninformative attribute is one that is equally associated with each class label. For the source input,suppose we get the result $c_t=l_i$ for the test case $t_s$ .In the follow-up input,we add an uninformative attribute to each sample in $S$ and respectively a new attribute in $t_s$ . The choice of the actual value to be added here is not important as this attribute is equally associated with the class labels. The output of the follow-up test case should still be $l_i$ .
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR5------ Addition of informative attributes

Description:
Property: For the source input, suppose we get the result $c_t=l_i$ for the test case $t_s$ . In the follow-up input, we add an informative attribute to each sample in $S$ and $t_s$ such that this attribute is strongly associated with class $l_i$ and equally associated with all other classes. The output of the follow-up test case should still be $l_i$ .
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR6------ Consistence with re-prediction

Description:
Property: For the source input, suppose we get the result $c_t=l_i$ for the test case $t_s$ . In the follow-up input, we can append $t_s$ and $c_t$ to the end of $S$ and $C$ respectively. We call the new training dataset $S'$ and $C'$ . We take $S'$ , $C'$ and $t_s$ as the input of the follow-up case, and the output should still be $l_i$.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

Description:
Property: For the source input, suppose we get the result $c_t=l_i$ for the test case $t_s$. In the follow-up input, we duplicate all samples in $S$ with label $l_i$ , as well as their associated labels in $C$. The output of the follow-up test case should still be $l_i$ .
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR8------ Addition of classes by duplicating samples.

Description:
Property: For the source input, suppose we get the result $c_t =l_i$ for the test case $t_s$. In the follow-up input, we duplicate all samples in $S$ and $C$ that do not have label $l_i$ and concatenate an arbitrary symbol “*” to the class labels of the duplicated samples. That is, if the original training sample set $S$ is associated with class labels $\langle A, B, C\rangle$ and $l_i$ is $A$, the set of classes in $S$ in the follow-up input could be $\langle A, B, C, B*, C*\rangle$. The output of the follow-up test case should still be $l_i$. Another derivative of this metamorphic relation is that duplicating all samples from any number of classes which do not have label $l_i$ should not change the result of the output of the follow-up test case.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR9------ Addition of classes by re-labeling samples

Description:
Property: For the source input, suppose we get the result $c_t=l_i$ for the test case $t_s$. In the follow-up input, we re-label some of the samples in $S$ with labels other than $l_i$ , through concatenating an arbitrary symbol “*” to their associated class labels in $C$. That is, if the original training set $S$ is associated with class labels $\langle A, B, B, B, C, C, C\rangle$ and $c_0$ is $A$, the set of classes in $S$ in the follow-up input may become $\langle A, B, B, B*, C, C*, C*\rangle$. The output of the follow-up test case should still be $l_i$.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR10------ Removal of classes

Description:
Property: For the source input, suppose we get the result $c_t =l_i$ for the test case $t_s$. In the follow-up input, we remove one entire class of samples in $S$ of which the label is not $l_i$. That is, if the original training sample set $S$ is associated with class labels $\langle A, A, B, B, C, C\rangle$ and $l_i$ is $A$, the set of classes in $S$ in the follow-up input may become $\langle A, A, B, B\rangle$. The output of the follow-up test case should still be $l_i$.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR11------ Removal of samples

Description:
Property: For the source input, suppose we get the result $c_t =l_i$ for the test case $t_s$. In the follow-up input, we remove part of some of the samples in $S$ and $C$ of which the label is not $l_i$ . That is, if the original training set $S$ is associated with class labels $\langle A, A, B, B, C, C\rangle$ and $l_i$ is $A$, the set of classes in $S$ in the follow-up input may become $\langle A, A, B, C\rangle$. The output of the follow-up test case should still be $l_i$.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:
Insert title here