Resource Optimization in Heterogeneous Distributed Data Stream Mining with Performance Assessment on Asthma Hospitalization Predictive Modeling

Bhalla, Rashi

Resource Optimization in Heterogeneous Distributed Data Stream Mining with Performance Assessment on Asthma Hospitalization Predictive Modeling

Files

Thesis embargoed until 24 March 2026

Size: 13.54 MB, File format: Adobe PDF

Date

2023

Authors

Bhalla, Rashi

Supervisor

Mirza, Farhaan

Naeem, M. Asif

Degree name

Doctor of Philosophy

Publisher

Auckland University of Technology

Abstract

The Big Data Era has presented many opportunities for data mining techniques to discover knowledge patterns across an exponentially growing diverse collection of data. In many application domains, data exists in a distributed fashion across geographical locations; as such, the nature of the data collected by each location may differ from its peer nodes in its network, thus causing heterogeneity in the data. Such scenarios of distributed databases require mining methods that are distinct from traditional homogeneous distributed databases where the structure is identical across locations. The current demand is to analyze real-time data; consequently, stream computing is becoming a popular choice. Data generated continuously at a high pace at distributed sites or locations are termed as distributed data streams. Recent approaches toward Distributed Data Mining (DDM) have focused on addressing the heterogeneous nature of data sources. However, such approaches do not prioritize the reduction of data communication costs which could be prohibitive in large-scale sensor networks where bandwidth is a limited resource. In fact, higher communication and computational costs are the two most prominent problems encountered in heterogeneous distributed environments. An effort to decrease communication in the distributed environment adversely influences classification accuracy; therefore, a research challenge lies in maintaining a balance between transmission cost, computational cost, and accuracy. This research covers the heterogeneous distributed data mining problem, extendable to the case where data arrives continuously in streaming mode. We propose a suite of algorithms to address specific issues in mining data from heterogeneous distributed streaming settings. Our experimental testing reveals that performance efficiency can be achieved across a wide range of datasets. The first algorithm, Performance Optimizer in Distributed Stream Mining (PODSM), having its roots in Bayesian Inference, is targeted towards reducing the communication volume and resource time in a heterogeneous DDM environment while retaining prediction accuracy. A reduction of 34.66% in communication was obtained for one of the datasets with nearly 27% savings in resource time. The second algorithm, Minimized Tree for Distributed Mining (MTDM), presents an efficient and robust method for learning the relationship between various distributed sites using a tree. In this regard, a saving of 37.65% in resource time has been reported for one dataset while improving the accuracy by 1.33%. To assess the algorithms’ competency, we validated them on a case study built using real datasets from real-world sources to predict demands for asthma-related emergency hospitalizations into Low or High classes. Considerable savings in terms of communication and resource time were attained upon execution of PODSM and MTDM while preserving accuracy levels, thus portraying their potential to achieve a good trade-off between accuracy and resource utilization. The study concludes that PODSM and MTDM are proficient in conjoint servicing heterogeneous distributed data sources in any resource-constrained scenario. Moreover, the capability of the algorithms to maintain a balance between accuracy, communication, and resource time makes them flexible enough for a diverse range of applications.

Permanent link

http://hdl.handle.net/10292/16330

Collections

Doctoral Theses

Full item page

Resource Optimization in Heterogeneous Distributed Data Stream Mining with Performance Assessment on Asthma Hospitalization Predictive Modeling

Files

Date

Authors

Supervisor

Item type

Degree name

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Source

DOI

Publisher's version

Rights statement

Permanent link

Collections

Endorsement

Review

Supplemented By

Referenced By