Optimising the Trade-Off Between Accuracy and Privacy in Data Stream Mining Environments
Data streams differ from static datasets due to numerous characteristics such as being incremental, high speed, high volume, subject to concept drift, and dynamically adapting. This unique nature of data streams makes Privacy-Preserving Data Stream Mining (PPDSM) rather challenging. The trade-off between data privacy and data mining accuracy is one of the significant concerns in PPDSM. Optimising this trade-off is a complicated task due to the nature of data streams. Though privacy-preserving methods are proposed to optimise this trade-off in PPDSM, there is still room for improvement in this area. Moreover, there is a lack of well-structured frameworks to perform the accuracy-privacy optimisation. This research aims to implement an appropriate perturbation method providing optimal trade-off between data privacy and data mining accuracy in PPDSM, which ultimately leads to a well-structured framework.
We proposed seven variations of noise addition methods to achieve high privacy while maintaining high accuracy. These novel methods combine cumulative noise addition, noise resetting, and cycle-wise noise addition, inspired by the well-known Logistic Function. The best-performing noise addition method from the proposed variations was used to build the Accuracy Privacy optimising Framework (APOF). The foundation of APOF is that the accuracy and privacy level depends entirely on the user, and achieving 100% accuracy and privacy is not possible. Consequently, APOF was designed to optimise the accuracy-privacy trade-off by considering the user's privacy requirements. The optimisation is achieved through a data fitting module. Finally, we extended APOF to Enhanced-APOF to operate in data streaming environments.
The logistic cumulative noise addition outperformed other proposed noise addition methods considering accuracy and privacy. The optimised accuracy-privacy trade-off could be achieved from the cycle-wise noise addition, and the cycles were designed based on the Logistic function. We could use all these benefits by using logistic cumulative noise addition as the privacy-preserving technique in APOF. Through the data fitting module APOF, we predicted the respective accuracy level for a user-defined privacy threshold retaining a small error. APOF allows the user to fine-tune requirements if needed and further optimise the accuracy-privacy trade-off according to his/her requirements. Experimental evidence shows the Enhanced-APOF is a well-structured framework for accuracy-privacy trade-off optimisation for a data streaming environment as it was designed considering the nature of data streams. The logistic cumulative noise addition for privacy preservation, Hoeffding Adaptive Tree for classification, and data fitting for optimisation have proven to be a prominent combination to achieve accuracy-privacy trade-off optimisation.