Exploring Malware Behavior of Webpages Using Machine Learning Technique: An Empirical Study

Alwaghid, AF; Sarkar, NI

Exploring Malware Behavior of Webpages Using Machine Learning Technique: An Empirical Study

aut.relation.endpage	1033
aut.relation.issue	6	en_NZ
aut.relation.journal	Electronics	en_NZ
aut.relation.startpage	1033
aut.relation.volume	9	en_NZ
aut.researcher	Sarkar, Nurul
dc.contributor.author	Alwaghid, AF	en_NZ
dc.contributor.author	Sarkar, NI	en_NZ
dc.date.accessioned	2020-07-01T04:18:26Z
dc.date.available	2020-07-01T04:18:26Z
dc.date.copyright	2020-06-23	en_NZ
dc.date.issued	2020-06-23	en_NZ
dc.description.abstract	Malware is one of the most common security threats experienced by a user when browsing web pages. A good understanding of the features of web pages (e.g., internet protocol, port, URL, Google index, and page rank) is required to analyze and mitigate the behavior of malware in web pages. This main objective of this paper is to analyze the key features of webpages and to mitigate the behavior of malware in webpages. To this end, we conducted an empirical study to identify the features that are most vulnerable to malware attacks and its results are reported. To improve the feature selection accuracy, a machine learning technique called bagging is employed using the Weka program. To analyze these behaviors, phishing and botnet data were obtained from the University of California Irvine machine learning repository. We validate our research findings by applying honeypot infrastructure using the Modern Honeypot Network (MHN) setup in a Linode Server. As the data suffer from high variance in terms of the type of data in each row, bagging is chosen because it can classify binary classes, date classes, missing values, nominal classes, numeric classes, unary classes and empty classes. As a base classifier of bagging, random tree was applied because it can handle similar types of data such as bagging, but better than other classifiers because it is faster and more accurate. Random tree had 88.22% test accuracy with the lowest run time (0.2 sec) and a receiver operating characteristic curve of 0.946. Results show that all features in the botnet dataset are equally important to identify the malicious behavior, as all scored more than 97%, with the exception of TCP and UDP. The accuracy of phishing and botnet datasets is more than 89% on average in both cross validation and test analysis. Recommendations are made for the best practice that can assist in future malware identification.	en_NZ
dc.identifier.citation	Electronics 2020, 9(6), 1033; https://doi.org/10.3390/electronics9061033
dc.identifier.doi	10.3390/electronics9061033	en_NZ
dc.identifier.issn	2079-9292	en_NZ
dc.identifier.issn	2079-9292	en_NZ
dc.identifier.uri	https://hdl.handle.net/10292/13479
dc.language	en	en_NZ
dc.publisher	MDPI AG	en_NZ
dc.relation.uri	https://www.mdpi.com/2079-9292/9/6/1033
dc.rights	© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
dc.rights.accessrights	OpenAccess	en_NZ
dc.subject	Ensemble method; Malicious software; Bagging; Random tree; Feature selection
dc.title	Exploring Malware Behavior of Webpages Using Machine Learning Technique: An Empirical Study	en_NZ
dc.type	Journal Article
pubs.elements-id	379434
pubs.organisational-data	/AUT
pubs.organisational-data	/AUT/Design & Creative Technologies
pubs.organisational-data	/AUT/Design & Creative Technologies/Engineering, Computer & Mathematical Sciences
pubs.organisational-data	/AUT/PBRF
pubs.organisational-data	/AUT/PBRF/PBRF Design and Creative Technologies
pubs.organisational-data	/AUT/PBRF/PBRF Design and Creative Technologies/PBRF ECMS

Files

Original bundle

Now showing 1 - 1 of 1

Name:: electronics-09-01033.pdf
Size:: 2.18 MB
Format:: Adobe Portable Document Format
Description:: Journal article

Download

License bundle

Now showing 1 - 1 of 1

Name:: AUT Grant of Licence for Tuwhera Aug 2018.pdf
Size:: 276.29 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

School of Engineering, Computer and Mathematical Sciences - Te Kura Mātai Pūhanga, Rorohiko, Pāngarau