Ensemble Classifier Modelling for Dealing with Missing Values

Hasan, Mohammad Rajib
Narayanan, Ajit
Sarkar, Nurul
Item type
Degree name
Master of Philosophy
Journal Title
Journal ISSN
Volume Title
Auckland University of Technology

An ensemble classifier method for life critical data classification is considered one of the most capable classifiers where data suffers from missing values. The execution of a decision tree classifier can be expanded by the ensemble method as it is found to be the most superior method for single classifiers. Notwithstanding, the performance of an ensemble classifier relies upon the data quality and missing values. In this study, we discover that better classification accuracy is often achieved by missing value imputation. Medical experts do not have confidence in missing value imputation (filling up the missing values by any of the statistical methods) as each case/attribute is unique and possesses different possibilities. Missing value imputation in life critical data may lead to the wrong diagnosis and thus medical decision making may be influenced wrongly, which is dangerous and life threatening. This study, therefore, proposes a new ensemble model that can accomplish a preferred accuracy of over 96 percent without missing value imputation. The relevancy of features like HPV, HIV, AIDS, and smoking with cervical cancer is a long debate. This study successfully selected some of these influential features and validated their relevancy in terms of accuracy with statistical error root squared mean error and mean absolute error. This study also considers true-positive and false-positive rates in accuracy. Finally, this study concluded that missing value imputation in life critical data may not be necessary to obtain better accuracy. Selection of base classifiers in the ensemble method should be the prior concern over missing value imputation.

Ensemble , Missing value , Classifier , Machine learning , Cervical cancer , Ensemble_rh
Publisher's version
Rights statement