Optimizing the computation of the Fourier Spectrum from decision trees

Date
2015
Authors
Nugraha, Johan Jonathan
Supervisor
Pears, Russel
Item type
Thesis
Degree name
Master of Computer and Information Sciences
Journal Title
Journal ISSN
Volume Title
Publisher
Auckland University of Technology
Abstract

In the field of mining data stream, processing time is one of the most important factor because data instances continuously arrive at high-speed. Thus, it is crucial to process each instance in timely manner. This concern is further emphasized when Discrete Fourier Transform (DFT) is applied to Decision Tree systems. DFT is highly beneficial in data stream mining as it allows us to represent a Decision Tree with less memory usage while preserving the accuracy of the prediction model. However, DFT is notorious for being computationally expensive to perform. An existing solution to mitigate this problem is by computing coefficients for smaller order. This approach, however, is highly inefficient for deep trees as the underlying problem space grows exponentially.

We seek to improve the efficiency of computing Fourier coefficients of Decision Trees in this study. We propose a divide and conquer solution by decomposing a Decision Tree into small sub-trees. Local Fourier coefficient table are then built for each sub-tree. The final Fourier coefficient table is then compiled by tabulating all the local Fourier coefficient tables. Local coefficient tables are build incrementally as it allows us to build a coefficient table from an existing coefficient table. This is particularly useful for two purposes. First is to track and synchronize Decision tree and its Fourier spectrum on the fly. Second is to exploit the possibility for reoccurring concepts. The latter purpose will be explored further in future works.

In our experiments, we compare the runtime of our solution to that of the normal DFT process with various order cutoff on both real and artificial datasets. The results show that our solution generally performed better than normal DFT process with various degrees of success from moderate to significant improvements. We also measured the runtime performances of our solution in eager synchronization setup for comparison.

Description
Keywords
Recurrent concepts , Data stream mining , Discrete Fourier Transform , Divide and conquer , Recurrent concept mining
Source
DOI
Publisher's version
Rights statement
Collections