IDENTIFIED (integrated dictionary-based extraction of non-language-dependent token information for forensic identification, examination, and discrimination): A dictionary-based system for extracting source code metrics for software forensics

Gray, A
Sallis, P
MacDonell, S
Item type
Conference Contribution
Degree name
Journal Title
Journal ISSN
Volume Title

The frequency and severity of computer-based attacks such as viruses and worms, logic bombs, trojan horses, computer fraud, and plagiarism of software code have all become of increasing concern to many of those involved with information systems. Part of the difficulty experienced in collecting evidence regarding the attack or theft in such situations has been the definition and collection of appropriate measurements to use in models of authorship. With this purpose in mind a system called IDENTIFIED is being developed to assist with the task of software forensics which is the use of software code authorship analysis for legal or official purposes. IDENTIFIED uses combinations of wildcards and special characters to define count-based metrics, allows for hierarchical metametric definitions, automates much of the file handling task, extracts metric values from source code, and assists with the analysis and modelling processes. It is hoped that the availability of such tools will encourage more detailed research into this area of ever-increasing importance.

Software Engineering: Education & Practice, 1998. Proceedings. 1998 International Conference pages 252 - 259.
Publisher's version
Rights statement
Copyright © 1998 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.