Novel methods for distributed and privacy-preserving data stream mining

Denham, Benjamin James

Novel methods for distributed and privacy-preserving data stream mining

Files

(1.48 MB)

Date

2019

Authors

Denham, Benjamin James

Supervisor

Pears, Russel

Naeem, Muhammad Asif

Item type

Thesis

Degree name

Master of Computer and Information Sciences

Publisher

Auckland University of Technology

Abstract

The growing number of “big” datasets present many opportunities for data mining, but also raise a variety of new challenges. Datasets may take the form of continuous streams with constantly changing patterns, they may be too widely distributed to be centralised for analysis at a single location, or they may contain sensitive values that data owners are not willing to share due to privacy concerns. Much past research has considered these issues individually, but few existing methods can address combinations of these properties. Therefore, this research develops methods for distributed and privacy-preserving data stream mining: a novel Hierarchical Distributed Stream Miner (HDSM) that learns relationships between the features of separate streams with minimal data transmission to central locations, and two data perturbation methods for privacy-preserving stream mining based on the combination of random projection, random translation, and additive noise. Experimental evaluation of HDSM demonstrates significant improvements in classification accuracy over existing distributed stream mining approaches while minimising data transmission and computational costs. HDSM’s ability to dynamically trade-off accuracy with these costs is also demonstrated. Variations of the known input-output Maximum A Posteriori (MAP) attack are developed to experimentally evaluate the data perturbation methods, and the proposed composite methods are shown to achieve a better trade-off between privacy and model accuracy than random projection alone. Finally, an approach is described for combining HDSM with data perturbation to achieve distributed privacy-preserving stream mining.

Keywords

machine-learning , data stream mining , distributed data mining , privacy-preserving data mining

Permanent link

https://hdl.handle.net/10292/12536

Collections

Masters Theses

Full item page