Software forensics applied to the task of discriminating between program authors
MacDonell, SG; Gray, AR
MetadataShow full metadata
Software forensics is here regarded as the particular field of inquiry that, by treating pieces of program source code as linguistically and stylistically analyzable entities, attempts to investigate various aspects of computer program authorship. These inquiries could be performed with any number of goals in mind, including those of intensification, discrimination and characterization of authors. In this paper we extract a set of 26 authorship-related metrics from 351 source code programs, written by 7 different authors. The use of feed-forward neural network (FFNN), multiple discriminant analysis (MDA), and case-based reasoning (CBR) models for discriminating these programs are then investigated in terms of classification accuracy for the authors on both training and testing (holdout) samples. The first two techniques (FFNN and MDA) produce remarkably similar results, with the overall best results coming from the CBR models. All of the examined modelling techniques have prediction accuracy rates of over 80% supporting the claim that it is feasible to use such techniques for the task of discriminating program authors based on source-code measurements in a majority of cases.