Risk Prediction of Chronic Disease Using Machine Learning and Rebalancing Methods
Chronic diseases cause damage to important organs such as the brain, heart, and liver, which can easily cause disability, affect labor ability and quality of life, and the medical expenses are extremely high, which increases the economic burden of society and families. An effective method is to create predictive models to assess the risk of chronic diseases. Researchers have conducted several projects, but challenges still exist.
The challenge is the imbalance of chronic disease data. When encountering unbalanced chronic diseases data, the classification algorithms will calculate the majority class (non- disease), while the minority class sample (disease) is not calculated. In order to accurately identify the disease and non-disease individuals, this research proposes a multi-combination method to deal with chronic disease data sets with imbalanced categories. The researcher conducted an in-depth analysis of the impact of three rebalancing methods: Synthetic minority oversampling technique (SMOTE), Resampling and SpreadSubsampling on the classifier processing through six classifiers and four data sets. Experimental results show that Random Forest (RF) combined with Resample rebalancing method (RF-RESAMPLE) is the best classifier of our selection of data sets and achieved 94.8770%. The method can assist doctors to identify chronic diseases, and then diagnose and treat patients early to increase their chances of survival.