A Study of Malware Behaviour of Webpages
Malware is one of the most common security threats experienced by the user when browsing webpages. A good understanding of the features of webpages (e.g. internet protocol, port, URL, Google index, HTTPS token, and page rank) is required to analyse and mitigate the behaviour of malware in a webpage. In this paper, an experimental analysis is performed to identify the features in webpages that are most vulnerable to malware attack and its results are reported. To improve the feature selection accuracy, a machine learning technique called bagging is employed. To analyse these behaviour, phishing and botnet data were obtained from University of California Irvine machine learning repository. To validate the findings, honeypot infrastructure is applied by using the Modern Honeypot Network (MHN) set-up in a Linode server. As the data suffer from high variance in terms of the type of data in each row, bagging is chosen because it can classify binary class, date class, missing values, nominal class, numeric class, unary class and empty class. As a base classifier of bagging, random tree is applied because it can handle similar types of data as bagging, but better than other classifiers because it is faster and more accurate. The findings of the research show that all features in botnet dataset are equally important to identify the malicious behaviour as all scored more than 97%, with the exception of TCP and UDP. During the research experiment, it was discovered that the accuracy of phishing and botnet datasets is more than 89% average in both cross validation and test analysis. The study concludes by offering recommendations and future research directions that may assist in future malware identification.