Near closed frequent itemsets to accelerate the generation of association rules in a data stream environment
The subject of this research is mining data stream. It is one of the most challenging and widely researched areas in Knowledge Discovery and Data Mining (KDD). A data stream is a continuous, voluminous, and unpredictable flow of data which occurs in many application domains. In a previous study, Data Stream Mining (DSM) algorithm was proposed to overcome these problems on association rules mining. It was built using various techniques such as closed frequent itemsets, tree data structures, itemsets pruning, and statistical sampling. We have developed Near Closed Nodes algorithms, which can be applied to algorithms for mining association rules that utilised closed itemsets structure. In this study, we look into the characteristics of closed frequent itemsets and propose a novel concept called Near Closed Nodes (NCN). This concept was thoroughly explored and later developed in conjunction with an existing DSM algorithm. By incorporating NCN into the DSM algorithm, we were able to increase the performance of both speed and memory usage. A comprehensive experimental study was performed to compare the performance of DSM and DSM-NCN using both simulated and real world datasets. Based on the results from the experimental study, we concluded that DSM-NCN outperformed DSM in most circumstances, especially when the datasets were dense.