IDENTIFIED (integrated dictionary-based extraction of non-language-dependent token information for forensic identification, examination, and discrimination): A dictionary-based system for extracting source code metrics for software forensics
The frequency and severity of computer-based attacks such as viruses and worms, logic bombs, trojan horses, computer fraud, and plagiarism of software code have all become of increasing concern to many of those involved with information systems. Part of the difficulty experienced in collecting evidence regarding the attack or theft in such situations has been the definition and collection of appropriate measurements to use in models of authorship. With this purpose in mind a system called IDENTIFIED is being developed to assist with the task of software forensics which is the use of software code authorship analysis for legal or official purposes. IDENTIFIED uses combinations of wildcards and special characters to define count-based metrics, allows for hierarchical metametric definitions, automates much of the file handling task, extracts metric values from source code, and assists with the analysis and modelling processes. It is hoped that the availability of such tools will encourage more detailed research into this area of ever-increasing importance.