Detecting Business Email Compromise and Classifying for Countermeasures

Godakanda Arachchige, Pubudu Gayan Buddhika
Cusack, Brian
Item type
Degree name
Master of Philosophy
Journal Title
Journal ISSN
Volume Title
Auckland University of Technology

The use of email has evolved radically since 1965 when the first email was sent. It was just a simple text, like a paste note, that anyone could see who used that computer. Today emails are multimedia, have their own servers or cloud service, and are at the core of global communications. With the technology advancements and Internet expansion, email has become an essential tool for individuals and organisations as a communication tool. Today when millions of users use email to exchange messages and information, perpetrators are attracted to steal information from email messages. The commercial use of email has heightened the motivation of hackers and attackers to compromise email communications and to exploit them for their own gain. Therefore, IT security experts have introduced heavy encryption methods to protect the email message and its contents, plus various protocols and security standards have been implemented to protect email communication channels. However, Rose (2021) cites that most email and IT-related security issues occur today because of human errors. These human errors occur because of a lack of awareness of the security threats, failure to follow instructions, and insecure local devices.

In this research the first phase of experimental work is collecting the pilot spam emails to build the corpora for training the NLP model. Initially the NLP model uses binary classification to categorise the spam into harmful and harmless emails. In the second stage, I build the python program to classify spam and ham emails using NLP. At the same time, I set up the isolated mock network to collect the spam emails according to the user behaviour within the controlled and scripted actions. In the third phase, I train the NLP model and use the collected spam email to identify how accurate is the NLP model. These findings lead to accuracy, efficiency, and information on how user behaviour influences the number of incoming spam emails based on what users do on the Internet. It is concluded that security models can be tricked by the similarity of harmful and harmless emails. The hackers and attackers are highly skilled at crafting similarity. The trained NLP model enhances protection, adaptation to new threats, and learning, but sometimes it is still not enough to identify all spam emails. Therefore, the in the last phase of research I design a new framework (Figure 5.1) to improve business email security by identifying spam and other false emails before they come to the user inboxes. The research results also signal the importance of training and self-monitoring of all behaviours used on the Internet as this influences the amount and type of spam attacks received.

Publisher's version
Rights statement