Computational Methods in Machine Learning for Privacy Preservation

Shi, CatherineNguyen, MinhYan, Wei QiWu, JinsongMa, Bo2026-03-122026-03-122026http://hdl.handle.net/10292/20762A huge amount of data is currently stored online for training deep learning models. Cryptographic techniques such as fully homomorphic encryption (FHE) and secure multi-party computation (MPC) in principle, enable ML over encrypted or distributed data, but they protect the confidentiality of computation and communication and have different threat models and application scenarios. This thesis focuses on a complementary concern: protecting against inference from the outputs of machine learning (e.g., released models or predictions)—such as membership inference, attribute inference, and model inversion—for which differential privacy (DP) is well suited. Differential privacy adds appropriate noise according to a predetermined privacy budget to limit such inference. But even if the differential privacy method is introduced to realize the function of privacy protection, it will have a negative impact on the learning performance of the machine learning model. The main question is, can some methods be found to measure privacy-preserving capability and machine-learning accuracy, and at the same time propose a privacy-preserving method combined with the machine-learning model to balance the trade-off between accuracy and privacy? Based on this question and motivations, in this thesis, a privacy-preserving framework for deep learning that contributes towards solving this problem is presented. This framework consists of three layers pre-processing layer, a model layer, and an assessment layer. The approaches to the proposals consist of three stages of frameworks. The function of the pre-processing layer is to implement privacy-preserving. This approach will generate the required privacy noises for privacy protection. In the model layer, multiple methods for natural language processing and object detection/recognition have been investigated, with the data that has noise injection for the purpose of privacy protection, in this stage, two main approaches have been proposed, they are A privacy-preserving deep transformation self-attention (PPDPTS) and PDPIFSEA algorithms. In the third layer, the function of quality assessment verifies the quality of the model that has been trained to determine whether the output inferred by the model can reach the desired level of privacy. In this framework, the data noises for privacy protection are quantified for deep learning models through statistical-based analysis methods has been proposed, there are BUA TDA, EMPA for quantified image-based datasets, and ABAPER, PBVS algorithms. Our experimental results show that the accuracy produced by the models in this framework is higher than that using other privacy-preserving methods. The original contributions of this thesis are: (1) A novel dynamic entropy-based noise-generating method with differential privacy approaches to improve privacy protections for federated deep learning. (2) A novel distributed stochastic gradient descent for improving the performance of privacy-preserving deep learning. (3) A privacy-preserving deep transformation self-attention (PPDPTS) method was applied to implement a self-attention mechanism on visual features of images to assist privacy preservation. (4) A method to measure the privacy boundary and privacy budget in privacy-preserving deep learning and test the privacy budget for preventing privacy leakage in the output data, and a predictive reconstruction algorithm to predict the distribution of data in the privacy-preserving deep learning model, aligning the privacy budget with prediction results.enComputational Methods in Machine Learning for Privacy PreservationThesisOpenAccess