Evaluation of a data mining application adopting private information retrieval in the cloud computing environment
MetadataShow full metadata
Cloud computing has become a cost effective and practical solution for data-intensive data mining technologies. The results of data mining are highly sensitive and should be private to the end user in order to provide a trustful service. Although cloud vendors have provided a series of cloud security controls, users are still concerned about the internal security loopholes which come from cloud service provider staff such as DBA or data analyst. Private information retrieval (PIR) is a protocol that retrieves information from database without revealing the information. However, few studies have examined the possibility and efficiency of implementing PIR in data mining under cloud environment and this is what we set out to investigate in this research. This research was carried out to analyse whether PIR can improve security without negatively affecting performance. In this research, data mining application was implemented under cloud environment. A PIR protocol was also applied to the data mining application to improve security. The processing time of PIR and entire data mining application over multiple datasets with different sizes were recorded. The results were analysed using t-test and linear regression in order to analyse the relationships among dataset size, processing time of PIR and entire data mining applications. The experiments showed that the PIR protocol used in this research is capable of encrypting the results of queries while producing the correct query results. There are indications that the processing time of PIR will eventually constitute 90% of the overalls, therefore, the PIR protocol used in this research has been found to be inefficient under the experimental data mining application with large dataset. This research has shown that the PIR protocol requires further improvement for use with big data and other encryption methods should also be investigated in order to secure data mining results.