Semi-automated Extraction of New Product Features from Online Reviews to Support Software Product Evolution

Volabouth, Phonephasouk

Semi-automated Extraction of New Product Features from Online Reviews to Support Software Product Evolution

aut.embargo	No	en_NZ
aut.thirdpc.contains	No	en_NZ
aut.thirdpc.permission	No	en_NZ
aut.thirdpc.removed	No	en_NZ
dc.contributor.advisor	Buchan, Jim
dc.contributor.advisor	MacDonell, Stephen
dc.contributor.author	Volabouth, Phonephasouk
dc.date.accessioned	2017-11-24T03:13:19Z
dc.date.available	2017-11-24T03:13:19Z
dc.date.copyright	2017
dc.date.created	2017
dc.date.issued	2017
dc.date.updated	2017-11-24T02:35:35Z
dc.description.abstract	Throughout its lifetime, a software product is expected to continually improve and evolve to ensure its efficiency, utility and desirability to users, both existing and potential. This involves incrementally developing new product features and improvements that are then integrated into the existing product and released to users as a new product release. Deciding on which features to include in the next product release is part of the release planning process. This involves identifying new potential features, filtering and prioritising them for the next release. A readily available resource for finding these candidate release requirements is users’ online reviews and feedback of the product, based on their experiences of using the product. It is common to find users suggesting new product features to meet their specific needs in these reviews. Sorting through such users’ feedback manually can be very time-consuming, however, often requiring a large volume of unstructured and noisy data to be sifted through for suggested features. In this thesis, a novel method of extracting potential new features from online user reviews is proposed, using semi-automated machine-based analysis of the online reviews. In this research, a semi-automated tool is implemented in Python by extracting, classifying and ordering requirements of the product users from their online reviews. The approach incorporates a sequence of text processing and classification techniques. Online reviews of two software products, Jira and Trello, are used as cases for the feasibility test and optimisation of the process proposed. The main process includes firstly using a supervised machine learning approach to distinguish sentences in the reviews that represent new features, from other aspects of the reviews. Then, after a de-duplication process, multiclass multi-label classification is applied to the remaining set of candidate release requirements to group them into functional groups. This grouping of the extracted features provides a structure for the product manager to subsequently analyse and prioritise the extracted candidate features. The target categories are manually defined from knowledge of the product functionality. Several techniques for extracting the features are explored, and the optimal models for classification are identified based on precision, recall and F-measure metrics. Several iterations of experiments are conducted with control variables such as n-grams, sentiment analysis and training sample size. Three learning algorithms are used in the experiments such as Naïve Bayes (with multinomial and Bernoulli variants), Support Vector Machines (with linear and multinomial variants) and Logistic Regression. Semantic similarity measures are used in the pre-categorization for the reduction of sentence duplication. The ranking of user requirements by category and date is performed by term-weighted frequency, which reflects frequently mentioned features by app users. In the evaluation of classifications, our best models show good results of performance based on measures of precision, recall and F-measures. In binary filtering, linear SVM with the input of review sentence and sentence sentiment, using 4-gram technique and validating by 10-fold is the best model in filtering sentences of feature requirements with a precision of 90.9%, recall of 91.1% and F1 of 91.0%. The corresponding model also performs well with the simulation of input data by splitting the experimental dataset and by an external dataset. For multiclass multi-label classification, Linear SVM with 5-fold validation and unigram technique shows the top performance on 5-class classification with 86% precision, 78% recall and 82 % F-scores on the experimental dataset and 89% precision, 81% recall and 83% F-scores on the simulated dataset. Overall, the research conducted has confirmed the feasibility of semi-automated extraction of candidate release requirements from a large volume of unstructured and noisy online user reviews. Furthermore, specific machine-based analysis techniques have been evaluated and recommendations are made based on commonly used metrics. This proposed process can now be further expanded and extended to usefully support future product release planning.	en_NZ
dc.identifier.uri	https://hdl.handle.net/10292/11026
dc.language.iso	en	en_NZ
dc.publisher	Auckland University of Technology
dc.rights.accessrights	OpenAccess
dc.subject	Software requirement	en_NZ
dc.subject	Feature request	en_NZ
dc.subject	Feature ranking	en_NZ
dc.subject	Term-weighted frequency	en_NZ
dc.subject	Machine learning	en_NZ
dc.subject	SVM	en_NZ
dc.subject	Software evolution	en_NZ
dc.subject	Software improvement	en_NZ
dc.title	Semi-automated Extraction of New Product Features from Online Reviews to Support Software Product Evolution	en_NZ
dc.type	Thesis
thesis.degree.grantor	Auckland University of Technology
thesis.degree.level	Masters Theses
thesis.degree.name	Master of Computer and Information Sciences	en_NZ

Files

Original bundle

Now showing 1 - 1 of 1

Name:: VolabouthP.pdf
Size:: 3.31 MB
Format:: Adobe Portable Document Format
Description:: Whole thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 897 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Masters Theses