Fuzzy Ontology Case-based Reasoning Approaches to Prediction of Cardiovascular Disease
Cardiovascular disease (CVD) is a major cause of morbidity and mortality. However, current widely used regression models are known to have a number of drawbacks, including prediction inaccuracy for individuals and for other cohorts, inflexibility of handling intervention, requirement of complete clinical data, deficiency of dealing with inaccurate, vague and uncertain data, and poor explanatory capacity.
Therefore, this research developed a novel prediction model named CRISK – short for CVD Risk – for predicting 10-year risk of CVD. The model was developed based on a combination of fuzzy ontology and case-based reasoning (CBR). Fuzzy ontology can help handle and store vague and uncertain data, which is common in real life. Retrieving the closest cases to the input case, CBR could contribute to the development of a personalised prediction model. The CRISK model retrieves the seven closest cases to the input case and generates prediction outcomes from these seven closest cases. To do this, three algorithms, Retrieve, Reuse, and Revise, were developed. The CRISK model uses 13 risk factors: total cholesterol, low-density lipoprotein (LDL) cholesterol, very-low-density lipoprotein (VLDL) cholesterol, systolic blood pressure (SBP), triglycerides, diastolic blood pressure (DBP), glucose, number of cigarettes smoked a day, high-density lipoprotein (HDL) cholesterol, hematocrit, body mass index (BMI), and lactate dehydrogenase (LDH). Moreover, the model introduced a new way to represent and interpret CVD prediction outcomes when compared with existing models. In CRISK, the prediction outcomes are represented as fuzzy membership values of the “High CVD Risk” and “Low CVD Risk” fuzzy sets. Depending on the fuzzy membership value, a different level of attention is given to the input case. Using this method, not only the predicted risk category but also the prediction of when CVD would happen is provided.
The CRISK model achieved reasonably good predictions. For internal validation, the prediction performance results were True Positive Rate (TPR)=0.8733 (CI=0.0102), True Negative Rate (TNR)=0.8270 (CI=0.0116), Precision=0.2247 (CI=0.0128), F1-value=0.3574 (CI=0.0147), and Negative Prediction Value (NPV)=0.9913 (CI=0.0029) where CI is the 95% confidence interval. These performance results were obtained from experiments using the Framingham Heart Study (FHS) Offspring Cohort Exam 1 dataset, which was the dataset used to develop the CRISK model. For external validation, experiments on the FHS Original Cohort Exam 11 dataset were performed. This dataset had two missing risk factors: triglycerides and LDH. The prediction results obtained for this external validation were TPR=0.8167 (CI=0.0434), TNR=0.5041 (CI=0.0560), Precision=0.2866 (CI=0.0507), F1-value=0.4242 (CI=0.0554), and NPV=0.9185 (CI=0.0307) where CI is the 95% confidence interval. In addition, the CRISK model was analysed to be able to solve or partially solve five out of eight limitations of regression models identified in this research. Moreover, CRISK gave a better prediction performance in comparison with two high-profile existing CVD prediction models.
This research has shown the usefulness of fuzzy ontology CBR approaches in CVD prediction. The achievements from the research are promising. Therefore, it would be worth investing more into fuzzy ontology CBR approaches in building CVD prediction models specifically and in building chronic disease prediction models generally. However, it would not be that a prediction model is built once and used forever. It is rather to continuously perform experimentation and update the model when new datasets arrive, especially datasets from different ethnic groups. These would help keep improving the prediction performance for the model and keep the model up to date.