Repository logo
 

Biomedical Data Integration Framework for Glucose Level Prediction Using Machine Learning Techniques

Date

Supervisor

Madanian, Samaneh
Niazi, Imran Khan
White, David

Item type

Thesis

Degree name

Doctor of Philosophy

Journal Title

Journal ISSN

Volume Title

Publisher

Auckland University of Technology

Abstract

Metabolic health conditions, characterized by elevated glucose levels, are significantly influenced by lifestyle factors such as diet, physical activity, and sleep. While Continuous Glucose Monitoring (CGM) devices provide essential insights into interstitial glucose (IG) levels, they often lack the broader physiological context necessary for a comprehensive metabolic assessment. This thesis investigates the potential of leveraging smartwatch data, specifically from devices like the Empatica E4, for interstitial glucose (IG) prediction through machine learning (ML). It makes four key contributions: (1) identifying taxonomies and methodologies for handling time-domain data in healthcare applications; (2) reviewing current digital biomarkers used for glucose prediction from smartwatch data and food logs; (3) comparing various ML models—including Decision Tree (DT), Support Vector Machine (SVM), Random Forest (RF), Linear Discriminant Analysis (LDA), K-Nearest Neighbours (KNN), and Gaussian Naïve Bayes (GNB)—to determine the most effective algorithms for predicting IG levels from wrist-worn smartwatches and dietary logs; and (4) developing novel sleep-related features derived from smartwatch data that significantly enhance IG prediction accuracy. A comprehensive systematic review of ML applications in time-domain electronic medical records identifies key models, features, and preprocessing techniques within the broader health data field. Additionally, a focused systematic review of digital biomarkers utilized in IG prediction highlights the necessity of comparing ML models and evaluating the utility of sleep biomarkers in glucose prediction. The investigation into most effective models for IG prediction reveals that that the RF model achieved the lowest Root Mean Squared Error (RMSE) of 9.04 mg/dL and an R-squared value of 0.84, whereas the GNB model exhibited the poorest performance, with an RMSE of 68.07 mg/dL. The ML models in this study use features measured from Empatica E4 smart watch data and food logs. The study then calculates sleep features derived from smart watch data, demonstrating that their inclusion in ML models, particularly DT, RF, and SVM, reduces the Mean Absolute Error (MAE) of predicted IG from 8.02 ± 0.22 mg/dL to 6.59 ± 0.33 mg/dL and enhances classification accuracy from 0.7988 ± 0.0183 to 0.8265 ± 0.0111, with statistical significance (p-value: 0.0001). These findings underscore the critical role of sleep metrics obtained from smart watch devices in improving the accuracy of glucose prediction models. By initially improving upon the literature with more effective use of smart watch sensor data, and subsequently incorporating novel sleep metrics derived from these devices, the study demonstrates a further refinement in predictive capabilities. Additionally, the application of explainable machine learning techniques, such as SHAP values, provides deeper insights into how various physiological factors influence glucose levels (for example highlighting the effect of slow wave sleep and wake bouts on glucose prediction models), and partial dependence plots show how feature interactions are modelled by RF model. The broader implications of this work suggest that integrating multi-modal data from wearables into interpretable ML models could significantly advance the management of metabolic health conditions, offering more personalized and effective interventions.

Description

Keywords

Source

DOI

Publisher's version

Rights statement

Collections