A Taxonomy of Data Quality Challenges in Empirical Software Engineering

Bosu, MF; Macdonell, SG

doi:10.1109/ASWEC.2013.21

A Taxonomy of Data Quality Challenges in Empirical Software Engineering

Files

Journal article

Size: 215.89 KB, File format: Adobe PDF

Date

2013

Authors

Bosu, MF

Macdonell, SG

Item type

Journal Article

Publisher

IEEE

Abstract

Reliable empirical models such as those used in software effort estimation or defect prediction are inherently dependent on the data from which they are built. As demands for process and product improvement continue to grow, the quality of the data used in measurement and prediction systems warrants increasingly close scrutiny. In this paper we propose a taxonomy of data quality challenges in empirical software engineering, based on an extensive review of prior research. We consider current assessment techniques for each quality issue and proposed mechanisms to address these issues, where available. Our taxonomy classifies data quality issues into three broad areas: first, characteristics of data that mean they are not fit for modeling, second, data set characteristics that lead to concerns about the suitability of applying a given model to another data set, and third, factors that prevent or limit data accessibility and trust. We identify this latter area as of particular need in terms of further research. © 2013 IEEE.

Keywords

Accessibility, Commercial sensitivity, Data quality, Empirical software engineering, Provenance, Trustworthiness

Source

Proceedings of the 22nd Australian Software Engineering Conference (ASWEC2013), Melbourne, Australia, pp.97 - 106. doi: 10.1109/ASWEC.2013.21

DOI

10.1109/ASWEC.2013.21

Publisher's version

http://dx.doi.org/10.1109/ASWEC.2013.21

Rights statement

Copyright © 2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Permanent link

https://hdl.handle.net/10292/10005

Collections

SERG - Software Engineering Research Group

Full item page

A Taxonomy of Data Quality Challenges in Empirical Software Engineering

Files

Date

Authors

Supervisor

Item type

Degree name

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Source

DOI

Publisher's version

Rights statement

Permanent link

Collections

Endorsement

Review

Supplemented By

Referenced By