Semi-automated Extraction of New Product Features from Online Reviews to Support Software Product Evolution
Throughout its lifetime, a software product is expected to continually improve and evolve to ensure its efficiency, utility and desirability to users, both existing and potential. This involves incrementally developing new product features and improvements that are then integrated into the existing product and released to users as a new product release. Deciding on which features to include in the next product release is part of the release planning process. This involves identifying new potential features, filtering and prioritising them for the next release. A readily available resource for finding these candidate release requirements is users’ online reviews and feedback of the product, based on their experiences of using the product. It is common to find users suggesting new product features to meet their specific needs in these reviews. Sorting through such users’ feedback manually can be very time-consuming, however, often requiring a large volume of unstructured and noisy data to be sifted through for suggested features. In this thesis, a novel method of extracting potential new features from online user reviews is proposed, using semi-automated machine-based analysis of the online reviews. In this research, a semi-automated tool is implemented in Python by extracting, classifying and ordering requirements of the product users from their online reviews. The approach incorporates a sequence of text processing and classification techniques. Online reviews of two software products, Jira and Trello, are used as cases for the feasibility test and optimisation of the process proposed. The main process includes firstly using a supervised machine learning approach to distinguish sentences in the reviews that represent new features, from other aspects of the reviews. Then, after a de-duplication process, multiclass multi-label classification is applied to the remaining set of candidate release requirements to group them into functional groups. This grouping of the extracted features provides a structure for the product manager to subsequently analyse and prioritise the extracted candidate features. The target categories are manually defined from knowledge of the product functionality. Several techniques for extracting the features are explored, and the optimal models for classification are identified based on precision, recall and F-measure metrics. Several iterations of experiments are conducted with control variables such as n-grams, sentiment analysis and training sample size. Three learning algorithms are used in the experiments such as Naïve Bayes (with multinomial and Bernoulli variants), Support Vector Machines (with linear and multinomial variants) and Logistic Regression. Semantic similarity measures are used in the pre-categorization for the reduction of sentence duplication. The ranking of user requirements by category and date is performed by term-weighted frequency, which reflects frequently mentioned features by app users. In the evaluation of classifications, our best models show good results of performance based on measures of precision, recall and F-measures. In binary filtering, linear SVM with the input of review sentence and sentence sentiment, using 4-gram technique and validating by 10-fold is the best model in filtering sentences of feature requirements with a precision of 90.9%, recall of 91.1% and F1 of 91.0%. The corresponding model also performs well with the simulation of input data by splitting the experimental dataset and by an external dataset. For multiclass multi-label classification, Linear SVM with 5-fold validation and unigram technique shows the top performance on 5-class classification with 86% precision, 78% recall and 82 % F-scores on the experimental dataset and 89% precision, 81% recall and 83% F-scores on the simulated dataset. Overall, the research conducted has confirmed the feasibility of semi-automated extraction of candidate release requirements from a large volume of unstructured and noisy online user reviews. Furthermore, specific machine-based analysis techniques have been evaluated and recommendations are made based on commonly used metrics. This proposed process can now be further expanded and extended to usefully support future product release planning.