Distributed incremental data stream mining for wireless sensor network
MetadataShow full metadata
Wireless sensor networks (WSNs) despite their energy, bandwidth, storage, and computational power constraints, have embraced dynamic applications. These applications generate a large amount of data continuously at high speeds and at distributed locations, known as distributed data stream. In these applications, processing data streams on the fly and in distributed locations is necessary mainly due to three reasons. Firstly, the large volume of data that these systems generate is beyond the storage capacity of the system. Secondly, transmitting such large continuous data to a central processing location over the air exhausts the energy of the system rapidly and limits its lifetime. Thirdly, these applications implement dynamic models that are triggered immediately in response to events such as changes in the environment or changes in set of conditions and hence, do not tolerate offline processing. Therefore, it is important to design efficient distributed techniques for WSN data stream mining applications under these inherent constraints. The purpose of this study was to develop a resource efficient online distributed incremental data stream mining framework for WSNs. The framework must minimize inter-node communications and optimize local computation and energy efficiency without compromising practical application requirements and quality of service (QoS). The objectives were to address the WSN energy constraints, network lifetime, and distributed mining of streaming data. Another objective was to develop a novel high spatiotemporal resolution version of the standard Canadian fire weather index (FWI) system called the Micro-scale FWI system based on the framework. The perceived framework integrates autonomous cluster based data stream mining technique and two-tiered hierarchical WSN architecture to suit the distributed nature of WSN and on the fly stream mining requirements. The underlying principle of the framework is to handle the sensor stream mining process in-network at distributed locations and at multiple hierarchical levels. The approach consists of three distinct processing tasks asynchronously but cooperatively revealing mining the sensor data streams. These tasks are the sensor node, the cluster head, and the network sink processing tasks. These tasks were formulated by a lightweight autonomous data clustering algorithm called Subtractive Fuzzy C-Means (SUBFCM). The SUBFCM algorithm remains embedded within the individual nodes to analyze the locally generated streams ‘on the fly’ in cooperation with a group of nodes. The study examined the effects of data stream characteristics such as data stream dimensions and stream periods (data flow rates). Moreover, it evaluated the effects of network architectures such as node density per cluster and tolerated approximation error on the overall performance of the SUBFCM through simulations. Finally, the QoS or certain level of guaranteed performance that is supported by the WSN architecture for applications utilizing the framework was examined. The results of the study showed that the proposed framework is stream dimension and data flow rate scalable with average errors of less than 12% and 11% in reference to the benchmarks, respectively. The node density per cluster and local model drift threshold showed significant effects on the framework performance only for very fast streams. The study concludes that the network architecture is an important factor for the quality of mining results and should be designed carefully to optimally utilize basic concepts of the framework. The overall mining quality is directly related to the combined effect of the stream characteristics, the network architecture, and the desired performance measures. The study also concludes that WSNs can provide good QoS feasible for online distributed incremental data stream mining applications. Simulations of real weather datasets indicate that the Micro-scale FWI can excellently approximate the results obtained from the Standard FWI system while providing highly superior spatial and temporal information. This can offer direct local and global interaction with a few meter square spaces as against the tens of square kilometers of the present systems.