Multi-metric prediction of software build outcomes
This thesis details the design, implementation and evaluation of software prediction models designed to address some of the challenges associated with the identification and mitigation of the risks associated with a software development project. Being able to predict potential failures during a software development project is critical to project success and has been the subject of decades of research. Despite the years of research and its importance to the software domain there is much about software project success and failure that remains unknown. This is partially due to the limited software project data available to researchers and the challenges of capturing the relationships between various software artifacts. It is also partially due to the representation, misinterpretation and lack of data captured and made available within existing software projects. As a result there is very little reported research where an attempt has been made to combine software metrics and social network metrics in order to predict software success and failure.
Software metrics extracted from the source code files of a system during its development are employed to create novel prediction models of software success and failure. The social component of a globally distributed software development team was also investigated using social network metrics. These social network metrics were directly mapped to software metrics in order to predict software build outcomes. This thesis presents the results of the first extensive source code analysis of a live software project (IBMs Jazz repository) using a range of traditional data mining methods. A novel data mining approach is reported in which a combination of both software and social network metrics are used to create software build prediction models. Additionally, data stream mining techniques were used to construct models for software build prediction. It has been found that data stream mining offers a powerful solution for monitoring the evolution of source code metrics and social network metrics over time.
It is found that using aggregated software metrics and social network metrics it is more difficult to predict software build failure than build success. The results also indicated that a combination of software metrics and social network metrics do not enhance prediction accuracy. However, when used in parallel they potentially provide an effective decision making tool to avoid potential failure.