Malware motif identification using Bio-inspired Data Mining

Date
2013
Authors
Chen, Yi
Supervisor
Narayanan, Ajit
Item type
Thesis
Degree name
Master of Computer and Information Sciences
Journal Title
Journal ISSN
Volume Title
Publisher
Auckland University of Technology
Abstract

The application of data mining techniques into biological data is well established. The aim of this thesis is to explore the effects of giving amino acid representation to problematic machine learning data and to evaluate the benefits of supplementing traditional data mining techniques with bioinformatics tools, techniques and databases. The focus of the research is on methods for identifying patterns in computer malware signatures typically used in current anti-viral software. In total, 60 computer viruses and 60 worm signatures were converted into amino acid representations and then aligned to produce fixed length sequences as input to data mining techniques for classification and prediction. Standard protein databases and modellers were also used to give a biological interpretation, and to find biological analogues of the polypeptide representations of the malware signatures. Protein modelling of the consensuses produced through sequence alignment and meta-signatures extracted from data mining provides novel ways of looking at malware signatures and their possible structure and function. However, the results varied by the method of biological representation used and further work is needed to determine the advantages and disadvantages of different methods for representing data as artificial polypeptide sequences.

Description
Keywords
Data mining, Malware , Bio-informatics
Source
DOI
Publisher's version
Rights statement
Collections