Feature Selection

### Program Information

Name: Feature Selection
Domain: Algorithm
Functionality: FS aims at reducing the dimensionality of the training sample set, removing irrelevant feature information, so as to increase the learning accuracy and improve result comprehensibility. We will focus on the filter method, which evaluates the relevance of features by studying the intrinsic property of the training data.
Input: The training data set $F=\langle F_1,F_2,\ldots ,F_n\rangle$ and a set of class labels $L=\{L_1,L_2,\ldots ,L_k\}$
Output: Feature subset.

#### Reference

Bottom-up Integration Testing with the Technique of Metamorphic Testing https://doi.org/10.1109/QSIC.2014.29

### MR Information

We define two categories of MRs for the two components(namely,search component and evaluation component): $\textbf{Category A,}$ denoted as $C_A$ , contains all MRs that satisfy (i) changes on the values of any feature subset in training samples do not affect the returned merit by the evaluation component; (ii) the search space (neither its size nor the position of each feature subset) of the search component remains unchanged.  $\textbf{Category B,}$ denoted as $C_B$, contains MRs with any changes other than the ones in $C_A$ . These MRs are based on the necessary properties of the entire FS system, that is, the combination of the particular evaluation and search components under investigation.

#### MR1------ Affine transformation on continuous features.

Description: MRs for $C_A$:
Property: We apply the same arbitrary affine transformation $f(x)=kx+b,(k\neq 0)$ to the values of all continuous features in each training sample. Then, the output feature subset should be the same.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR2------Affine transformation on the continuous class.

Description: MRs for $C_A$:
Property: We apply the same arbitrary affine transformation $f(x)=kx+b, (k\neq 0)$ to the value of continuous class label in each training sample. Then, the output feature subset should be the same.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR3------Permutation on training samples

Description: MRs for $C_A$:
Property: We permute some of the samples in the training set. Then, the output feature subset should be the same.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR4------ Addition of uninformative samples.

Description: MRs for $C_A$:
Property:  We add a sample into the training set, such that the value of each feature in this new sample is equal to the average value of the corresponding feature over all training samples. Then, the output feature subset should be the same.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR5------Duplication of training samples with the original class label.

Description: MRs for $C_A$:
Property: We duplicate the all training samples for $k$ times, with keeping their original class labels unchanged. Then, the output feature subset should be the same.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR6------Duplication of a sample with assignment of a new class label to it.

Description: MRs for $C_A$:
Property: We duplicate any sample in the training set and assign a new class label that does not belong to $L$. Then, the output feature subset should be the same.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR7------ Addition of uninformative feature.

Description: MRs for $C_B$
Property: We add an uninformative feature $F_{new}$ at the end of source input feature vector $F$, under which all samples have the same feature values. Then, the output feature subset should be the same.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR8------ Duplication of any unselected feature.

Description: MRs for $C_B$
Property: Suppose feature $F_i$ does not appear in the source output feature subset. We duplicate the whole column of $F_i$ in the original training sample set. That is, $F_i$ is added at the end of the original vector $F$ as a new feature, under which the values of all samples are the same as the ones of the original $F_i$ . Then, the output feature subset should be the same.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR9------Duplication of any selected feature.

Description: MRs for $C_B$
Property: Suppose feature $F_i$ appears in the source output feature subset. We duplicate the whole column of $F_i$ in the original training sample set. That is, $F_i$ is added at the end of the original vector $F$ as a new feature, under which the values of all samples are the same as the ones of the original $F_i$ . Then, the output feature subset should be the same.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:

#### MR10------Deletion of any unselected feature

Description: MRs for $C_B$
Property: Suppose feature $F_i$ does not appear in the source output feature subset. We delete the whole column of $F_i$ in the original training sample set. That is, $F_i$ and the values of all samples under it are deleted from the original vector $F$ and the training sample set, respectively. Then, the output feature subset should be the same.
Source input:
Source output:
Follow-up input:
Follow-up output:
Input relation:
Output relation:
Pattern:
Insert title here