The effects of different representations on static structure analysis of computer malware signatures
The continuous growth of malware presents a problem for internet computing due to increasingly sophisticated techniques for disguising malicious code through mutation and the time required to identify signatures for use by antiviral software systems (AVS). Malware modelling has focused primarily on semantics due to the intended actions and behaviours of viral and worm code. The aim of this paper is to evaluate a static structure approach to malware modelling using the growing malware signature databases now available. We show that, if malware signatures are represented as artificial protein sequences, it is possible to apply standard sequence alignment techniques in bioinformatics to improve accuracy of distinguishing between worm and virus signatures. Moreover, aligned signature sequences can be mined through traditional data mining techniques to extract metasignatures that help to distinguish between viral and worm signatures. All bioinformatics and data mining analysis were performed on publicly available tools and Weka.