SERL - Software Engineering Research Laboratory

Permanent link for this collection

https://hdl.handle.net/10292/1680

The Software Engineering Research Lab (SERL) at AUT University undertakes world-class research directed at understanding and improving the practice of software professionals in their creation and preservation of software systems. We are interested in all models of software provision – bespoke development, package and component customisation, free/libre open source software (FLOSS) development, and delivery of software as a service (SaaS). The research we carry out may relate to just one or all of these models.

Browse

Now showing 1 - 5 of 136

Just-in-Time Crash Prediction for Mobile Apps
(Springer Science and Business Media LLC, 2024-05-08) Wimalasooriya, C; Licorish, SA; da Costa, DA; MacDonell, SG
Just-In-Time (JIT) defect prediction aims to identify defects early, at commit time. Hence, developers can take precautions to avoid defects when the code changes are still fresh in their minds. However, the utility of JIT defect prediction has not been investigated in relation to crashes of mobile apps. We therefore conducted a multi-case study employing both quantitative and qualitative analysis. In the quantitative analysis, we used machine learning techniques for prediction. We collected 113 reliability-related metrics for about 30,000 commits from 14 Android apps and selected 14 important metrics for prediction. We found that both standard JIT metrics and static analysis warnings are important for JIT prediction of mobile app crashes. We further optimized prediction performance, comparing seven state-of-the-art defect prediction techniques with hyperparameter optimization. Our results showed that Random Forest is the best performing model with an AUC-ROC of 0.83. In our qualitative analysis, we manually analysed a sample of 642 commits and identified different types of changes that are common in crash-inducing commits. We explored whether different aspects of changes can be used as metrics in JIT models to improve prediction performance. We found these metrics improve the prediction performance significantly. Hence, we suggest considering static analysis warnings and Android-specific metrics to adapt standard JIT defect prediction models for a mobile context to predict crashes. Finally, we provide recommendations to bridge the gap between research and practice and point to opportunities for future research.
Improving Transfer Learning for Software Cross-Project Defect Prediction
(Springer Science and Business Media LLC, 2024-04-24) Omondiagbe, OP; Licorish, SA; MacDonell, SG
Software cross-project defect prediction (CPDP) makes use of cross-project (CP) data to overcome the lack of data necessary to train well-performing software defect prediction (SDP) classifiers in the early stage of new software projects. Since the CP data (known as the source) may be different from the new project’s data (known as the target), this makes it difficult for CPDP classifiers to perform well. In particular, it is a mismatch of data distributions between source and target that creates this difficulty. Transfer learning-based CPDP classifiers are designed to minimize these distribution differences. The first Transfer learning-based CPDP classifiers treated these differences equally, thereby degrading prediction performance. To this end, recent research has the Weighted Balanced Distribution Adaptation (W-BDA) method to leverage the importance of both distribution differences to improve classification performance. Although W-BDA has been shown to improve model performance in CPDP and tackle the class imbalance by balancing the class proportion of each domain, research to date has failed to consider model performance in light of increasing target data. We provide the first investigation studying the effects of increasing the target data when leveraging the importance of both distribution differences. We extend the initial W-BDA method and call this extension the W-BDA+ method. To evaluate the effectiveness of W-BDA+ for improving CPDP performance, we conduct eight experiments on 18 projects from four datasets, where data sampling was performed with different sampling methods. Data sampling was only performed on the baseline methods and not on our proposed W-BDA+ and the original W-BDA because data sampling issues do not exist for these two methods. We evaluate our method using four complementary indicators (i.e., Balanced Accuracy, AUC, F-measure and G-Measure). Our findings reveal an average improvement of 6%, 7.5%, 10% and 12% for these four indicators when W-BDA+ is compared to the original W-BDA and five other baseline methods (for all four of the sampling methods used). Also, as the target to source ratio is increased with different sampling methods, we observe a decrease in performance for the original W-BDA, with our W-BDA+ approach outperforming the original W-BDA in most cases. Our results highlight the importance of having an awareness of the effect of the increasing availability of target data in CPDP scenarios when using a method that can handle the class imbalance problem.
Progress Report on a Proposed Theory for Software Development
(SciTePress, 2015-08) Kirk, D; MacDonell, S
There is growing acknowledgement within the software engineering community that a theory of software development is needed to integrate the myriad methodologies that are currently popular, some of which are based on opposing perspectives. We have been developing such a theory for a number of years. In this position paper, we overview our theory along with progress made thus far. We suggest that, once fully developed, this theory, or one similar to it, may be applied to support situated software development, by providing an overarching model within which software initiatives might be categorised and understood. Such understanding would inevitably lead to greater predictability with respect to outcomes.
Packaged Software Implementation Requirements Engineering by Small Software Enterprises
(IEEE Computer Society, 2013) Jebreen, I; Wellington, R; MacDonell, SG
Small to medium sized business enterprises (SMEs) generally thrive because they have successfully done something unique within a niche market. For this reason, SMEs may seek to protect their competitive advantage by avoiding any standardization encouraged by the use of packaged software (PS). Packaged software implementation at SMEs therefore presents challenges relating to how best to respond to mismatches between the functionality offered by the packaged software and each SME's business needs. An important question relates to which processes small software enterprises - or Small to Medium-Sized Software Development Companies (SMSSDCs) - apply in order to identify and then deal with these mismatches. To explore the processes of packaged software (PS) implementation, an ethnographic study was conducted to gain in-depth insights into the roles played by analysts in two SMSSDCs. The purpose of the study was to understand PS implementation in terms of requirements engineering (or 'PSIRE'). Data collected during the ethnographic study were analyzed using an inductive approach. Based on our analysis of the cases we constructed a theoretical model explaining the requirements engineering process for PS implementation, and named it the PSIRE Parallel Star Model. The Parallel Star Model shows that during PSIRE, more than one RE process can be carried out at the same time. The Parallel Star Model has few constraints, because not only can processes be carried out in parallel, but they do not always have to be followed in a particular order. This paper therefore offers a novel investigation and explanation of RE practices for packaged software implementation, approaching the phenomenon from the viewpoint of the analysts, and offers the first extensive study of packaged software implementation RE (PSIRE) in SMSSDCs.
A Taxonomy of Data Quality Challenges in Empirical Software Engineering
(IEEE, 2013) Bosu, MF; Macdonell, SG
Reliable empirical models such as those used in software effort estimation or defect prediction are inherently dependent on the data from which they are built. As demands for process and product improvement continue to grow, the quality of the data used in measurement and prediction systems warrants increasingly close scrutiny. In this paper we propose a taxonomy of data quality challenges in empirical software engineering, based on an extensive review of prior research. We consider current assessment techniques for each quality issue and proposed mechanisms to address these issues, where available. Our taxonomy classifies data quality issues into three broad areas: first, characteristics of data that mean they are not fit for modeling, second, data set characteristics that lead to concerns about the suitability of applying a given model to another data set, and third, factors that prevent or limit data accessibility and trust. We identify this latter area as of particular need in terms of further research. © 2013 IEEE.

Browse

Recent Submissions