A Staged Approach to Classification in High Speed Concept Drifting Data Streams

aut.embargoNoen_NZ
aut.thirdpc.containsNoen_NZ
dc.contributor.advisorPears, Russel
dc.contributor.advisorNaeem, Asif
dc.contributor.authorKithulgoda, Chamari Indunil
dc.date.accessioned2019-05-14T22:24:24Z
dc.date.available2019-05-14T22:24:24Z
dc.date.copyright2019
dc.date.issued2019
dc.date.updated2019-05-14T07:35:35Z
dc.description.abstractData stream classification task needs to address challenges of enormous volume, continuous rapid flow, and concept drift of data in the presence of limited computer resources. A successful classifier is expected to result in higher accuracy and throughput under constrained memory conditions. This thesis aims at the problem of increasing throughput without sacrificing the accuracy of the classifier in concept drifting and recurring data streams. The solution introduces a novel stage learning framework that senses the context of data to determine the level of volatility in the stream. Two learning stages namely, Stage 1 and Stage 2 are defined in accordance with stream volatility. Stage 1 is the high volatility state where many new, previously unseen concepts appear. Stage 1 represents an intensive learning phase in the lifetime of a data stream and hence an ensemble of classifiers is used in this stage as previous research has shown that ensembles excel in such situations. In Stage 1, concepts are captured by the ensemble learner and then stored in an online repository for future use in the event that such concepts recur in the future. In contrast, Stage 2 represents a low volatility state and features a limited learning ability as it relies to a greater extent on concepts already captured in Stage 1. Our empirical results reveal that such a staged approach achieves a significant throughput advantage as a result of the deactivation of computationally expensive ensemble learner in Stage 2. Two main versions of the stage learning paradigm, namely Staged Online Learner (SOL), and SOL with Incremental Fourier Classifier (SOL-IFC) is presented. The Stage 1 segment of both SOL and SOL-IFC is driven by a forest of Hoeffding decision trees, each of which has its own drift detector. The Stage 2 learning segment of SOL is carried out by Fourier spectral representations of learned decision trees and a derivative in the form of a single decision tree. Although Stage 2 processing overhead of SOL is much smaller than in Stage 1, peaks in processing time occur at concept drifts have to be prevented. Consequently, an extended version of SOL named as SOL-IFC introduces an innovative incrementally adaptive Fourier spectrum arrangement for Stage 2. The IFC comprises of novel noisy feature filtering and instance synopsis generation mechanisms that contribute to further performance enhancements. As per empirical evidence, SOL-IFC outperforms state-of-art algorithms on a combined metric of throughput and classification accuracy while avoiding the issue of peak processing times at drifts with a smoother incremental Fourier adaptation process. The implementation independence of the staged learning approach is proven by the third version named as MLP-FC that replaces the decision tree forest with a neural network classifier during Stage 1.en_NZ
dc.identifier.urihttps://hdl.handle.net/10292/12511
dc.language.isoenen_NZ
dc.publisherAuckland University of Technology
dc.rights.accessrightsOpenAccess
dc.subjectData Streamen_NZ
dc.subjectClassificationen_NZ
dc.subjectConcept Driften_NZ
dc.subjectConcept Recurrenceen_NZ
dc.subjectIncremental Classifieren_NZ
dc.subjectDiscreet Fourier Transformen_NZ
dc.subjectFourier Spectrumen_NZ
dc.subjectEnsemble Classifieren_NZ
dc.subjectDecision Treeen_NZ
dc.subjectThroughput Accuracy Trade-offen_NZ
dc.titleA Staged Approach to Classification in High Speed Concept Drifting Data Streamsen_NZ
dc.typeThesisen_NZ
thesis.degree.grantorAuckland University of Technology
thesis.degree.levelDoctoral Theses
thesis.degree.nameDoctor of Philosophyen_NZ
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
KithulgodaC.pdf
Size:
1.59 MB
Format:
Adobe Portable Document Format
Description:
Thesis
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
889 B
Format:
Item-specific license agreed upon to submission
Description:
Collections