Data accumulation and software effort prediction
BACKGROUND: In reality project managers are constrained by the incremental nature of data collection. Specifically, project observations are accumulated one project at a time. Likewise within-project data are accumulated one stage or phase at a time. However, empirical researchers have given limited attention to this perspective.
PROBLEM: Consequently, our analyses may be biased. On the one hand, our predictions may be optimistic due to the availability of the entire data set, but on the other hand pessimistic due to the failure to capitalize upon the temporal nature of the data. Our goals are (i) to explore the impact of ignoring time when building cost prediction models and (ii) to show the benefits of re-estimating using completed phase data during a project.
METHOD: Using a small industrial data set of sixteen software projects from a single organization we compare predictive models developed using a time-aware approach with a more traditional leave-one-out analysis. We then investigate the impact of using requirements, design and implementation phase data on estimating subsequent phase effort.
RESULTS: First, we find that failure to take the temporal nature of data into account leads to unreliable estimates of their predictive efficacy. Second, for this organization, prior-phase effort data could be used to improve the management of subsequent process tasks.
CONCLUSION: We should collect time-related data and use it in our analyses. Failure to do so may lead to incorrect conclusions being drawn, and may also inhibit industrial take up of our research work.