Optimizing the computation of the Fourier Spectrum from decision trees
Date
Authors
Supervisor
Item type
Degree name
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In the field of mining data stream, processing time is one of the most important factor because data instances continuously arrive at high-speed. Thus, it is crucial to process each instance in timely manner. This concern is further emphasized when Discrete Fourier Transform (DFT) is applied to Decision Tree systems. DFT is highly beneficial in data stream mining as it allows us to represent a Decision Tree with less memory usage while preserving the accuracy of the prediction model. However, DFT is notorious for being computationally expensive to perform. An existing solution to mitigate this problem is by computing coefficients for smaller order. This approach, however, is highly inefficient for deep trees as the underlying problem space grows exponentially.
We seek to improve the efficiency of computing Fourier coefficients of Decision Trees in this study. We propose a divide and conquer solution by decomposing a Decision Tree into small sub-trees. Local Fourier coefficient table are then built for each sub-tree. The final Fourier coefficient table is then compiled by tabulating all the local Fourier coefficient tables. Local coefficient tables are build incrementally as it allows us to build a coefficient table from an existing coefficient table. This is particularly useful for two purposes. First is to track and synchronize Decision tree and its Fourier spectrum on the fly. Second is to exploit the possibility for reoccurring concepts. The latter purpose will be explored further in future works.
In our experiments, we compare the runtime of our solution to that of the normal DFT process with various order cutoff on both real and artificial datasets. The results show that our solution generally performed better than normal DFT process with various degrees of success from moderate to significant improvements. We also measured the runtime performances of our solution in eager synchronization setup for comparison.