Capturing recurring concepts in high speed data streams

aut.embargoNoen_NZ
aut.thirdpc.containsNoen_NZ
aut.thirdpc.permissionNoen_NZ
aut.thirdpc.removedNoen_NZ
dc.contributor.advisorPears, Russel
dc.contributor.authorSakthithasan, Sripirakas
dc.date.accessioned2015-05-08T00:03:17Z
dc.date.available2015-05-08T00:03:17Z
dc.date.copyright2015
dc.date.created2015
dc.date.issued2015
dc.date.updated2015-05-07T04:04:23Z
dc.description.abstractThis research addresses two key issues in high speed data stream mining that are related to each other. One fundamental issue is the detection of concept change that is an inherent feature of data streams in general in order to make timely and accurate structural changes to classification or prediction models. The shortcomings in the past research were addressed in two versions of a change detector that were produced during this research. The second major issue is the detection of recurring patterns in a supervised learning context to gain significant efficiency and accuracy advantages over systems that have severe time constraints on response time to change due to safety and time critical requirements. Capturing recurrent patterns requires the detection of concept change with minimal false positives. This research addresses this latter problem as a pre-requisite to formulating a novel mechanism for recognizing recurrences in a dynamic data stream environment. The first approach to change detection, termed SeqDrift1 that relies on a detection threshold derived using the Bernstein bound and sequential hypothesis strategy ensured much lower false positive rates and processing time than the most widely used change detector, ADWIN. The second version of the change detector, SeqDrift2, achieved significant improvement on detection sensitivity over SeqDrift1. This was achieved through two separate strategies. The first was the use of reservoir sampling to retain a larger proportion of older instances thus providing for better contrast with newer arriving instances belonging to a changed concept. The second strategy was to trade off false positive rate for detection delay in an optimization procedure. The net result was that SeqDrift2 achieved much lower detection delay than SeqDrift1 but sacrificed some of its false positive rate when compared to SeqDrift1, while still retaining its superiority with respect to this measure vis-à-vis ADWIN and other change detectors. Having proposed a robust and efficient mechanism for change detection two different meta-learning schemes for recurrent concept capture were proposed. A novel framework using the two schemes consists of concept change detectors to locate concept boundaries, a Hoeffding tree compressor to exploit the application of Discrete Fourier Transform on Decision Trees to produce compact Fourier Spectra, a forest of Hoeffding Trees to actively learn and a pool of Fourier spectra to be reused on similar recurring concepts. In the first scheme, termed Fourier Concept Trees (FCT), each Fourier spectrum is separately stored and reused on similar concepts. Accuracy and memory advantages have been empirically shown over an existing method called, MetaCT. In the second scheme, instead of storing each spectrum on its own, an ensemble approach, Ensemble Pool (EP), was adopted whereby several spectra were aggregated into single composite spectrum. The major advantage of this strategy over the first was the reduction in storage overhead as redundancies in separate spectra are eliminated by merging into one single entity. In addition, Fourier spectrum generation is optimized with theoretical guarantees to suit high speed environments. Extensive experimentation that demonstrated the benefits including accuracy stabilization, memory gain, reusability of existing models etc., has been done with a number of synthetic and real world datasets. This includes a case study on a Flight simulator system which is one of the target applications of this research.en_NZ
dc.identifier.urihttps://hdl.handle.net/10292/8647
dc.language.isoenen_NZ
dc.publisherAuckland University of Technology
dc.rights.accessrightsOpenAccess
dc.subjectRecurrent conceptsen_NZ
dc.subjectConcept drift detectionen_NZ
dc.subjectData stream miningen_NZ
dc.subjectDiscrete Fourier transformen_NZ
dc.subjectBernstein bounden_NZ
dc.subjectSequential hypothesis testingen_NZ
dc.subjectReservoir samplingen_NZ
dc.titleCapturing recurring concepts in high speed data streamsen_NZ
dc.typeThesis
thesis.degree.discipline
thesis.degree.grantorAuckland University of Technology
thesis.degree.grantorAuckland University of Technology
thesis.degree.levelDoctoral Theses
thesis.degree.nameDoctor of Philosophyen_NZ
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SripirakasS.pdf
Size:
3.4 MB
Format:
Adobe Portable Document Format
Description:
Whole thesis
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
889 B
Format:
Item-specific license agreed upon to submission
Description:
Collections