Examining the significance of high-level programming features in source code author classification

Frantzeskou, G; MacDonell, S; Stamatatos, E; Gritzalis, S

Examining the significance of high-level programming features in source code author classification

dc.contributor.author	Frantzeskou, G
dc.contributor.author	MacDonell, S
dc.contributor.author	Stamatatos, E
dc.contributor.author	Gritzalis, S
dc.date.accessioned	2011-11-10T06:36:29Z
dc.date.available	2011-11-10T06:36:29Z
dc.date.copyright	2008-03
dc.date.issued	2008-03
dc.description.abstract	The use of Source Code Author Profiles (SCAP) represents a new, highly accurate approach to source code authorship identification that is, unlike previous methods, language independent. While accuracy is clearly a crucial requirement of any author identification method, in cases of litigation regarding authorship, plagiarism, and so on, there is also a need to know why it is claimed that a piece of code is written by a particular author. What is it about that piece of code that suggests a particular author? What features in the code make one author more likely than another? In this study, we describe a means of identifying the high-level features that contribute to source code authorship identification using as a tool the SCAP method. A variety of features are considered for Java and Common Lisp and the importance of each feature in determining authorship is measured through a sequence of experiments in which we remove one feature at a time. The results show that, for these programs, comments, layout features and package-related naming influence classification accuracy whereas user-defined naming, an obvious programmer related feature, does not appear to influence accuracy. A comparison is also made between the relative feature contributions in programs written in the two languages.
dc.identifier.citation	Journal of Systems and Software, vol.81(3), pp.447 - 460
dc.identifier.doi	10.1016/j.jss.2007.03.004
dc.identifier.issn	0164-1212 (print) 1873-1228 (online)
dc.identifier.roid	3601	en_NZ
dc.identifier.uri	https://hdl.handle.net/10292/2490
dc.publisher	Elsevier
dc.relation.uri	http://dx.doi.org/10.1016/j.jss.2007.03.004
dc.rights	Copyright © 2008 Elsevier Ltd. All rights reserved. This is the author’s version of a work that was accepted for publication in (see Citation). Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. The definitive version was published in (see Citation). The original publication is available at (see Publisher's Version)
dc.rights.accessrights	OpenAccess
dc.subject	Authorship
dc.subject	Source code
dc.subject	Program features
dc.subject	Fraud
dc.subject	Computer
dc.subject	Style
dc.title	Examining the significance of high-level programming features in source code author classification
dc.type	Journal Article
pubs.organisational-data	/AUT
pubs.organisational-data	/AUT/Design & Creative Technologies
pubs.organisational-data	/AUT/Design & Creative Technologies/School of Computing & Mathematical Science
pubs.organisational-data	/AUT/PBRF Researchers
pubs.organisational-data	/AUT/PBRF Researchers/Design & Creative Technologies PBRF Researchers
pubs.organisational-data	/AUT/PBRF Researchers/Design & Creative Technologies PBRF Researchers/DCT C & M Computing

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Frantzeskou, MacDonell, Stamatatos and Gritzalis (2008) JSS.pdf
Size:: 260.98 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: licence.htm
Size:: 29.98 KB
Format:: Unknown data format
Description:

Download

Collections

SERG - Software Engineering Research Group