Maximising data retention from the ISBSG repository

aut.researcherMacDonell, Stephen Gerard
dc.contributor.authorDeng, K
dc.contributor.authorMacDonell, SG
dc.date.accessioned2011-08-23T09:39:41Z
dc.date.available2011-08-23T09:39:41Z
dc.date.copyright2008-06-26
dc.date.issued2008-06-26
dc.description.abstractBackground: In 1997 the International Software Benchmarking Standards Group (ISBSG) began to collect data on software projects. Since then they have provided copies of their repository to researchers and practitioners, through a sequence of releases of increasing size. Problem: Questions over the quality and completeness of the data in the repository have led some researchers to discard substantial proportions of the data in terms of observations, and to discount the use of some variables in the modelling of, among other things, software development effort. In some cases the details of the discarding of data has received little mention and minimal justification. Method: We describe the process we used in attempting to maximise the amount of data retained for modelling software development effort at the project level, based on previously completed projects that had been sized using IFPUG/NESMA function point analysis (FPA) and recorded in the repository. Results: Through justified formalisation of the data set and domain-informed refinement we arrive at a final usable data set comprising 2862 (of 3024) observations across thirteen variables. Conclusion: a methodical approach to the pre-processing of data can help to ensure that as much data is retained for modelling as possible. Assuming that the data does reflect one or more underlying models, such retention should increase the likelihood of robust models being developed.
dc.identifier.citationIn proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering (EASE), University of Bari, Italy, pp. on CD-ROM
dc.identifier.urihttps://hdl.handle.net/10292/1830
dc.publisherThe British Computer Society (BCS)
dc.relation.urihttp://www.bcs.org/content/conWebDoc/19533
dc.rights© The British Computer Society (BCS) and contributors 2008. All Rights Reserved. The content is copyright of BCS and of contributors who have exclusively licensed such copyright to BC. Authors retain the right to place his/her publication version of the work on a personal website or institutional repository for non commercial purposes only. A definitive version was subsequently published in (see Citation) The original publication is available at (see Publisher's Version)
dc.rights.accessrightsOpenAccess
dc.subjectEmpirical software engineering
dc.subjectISBSG repository
dc.subjectData formalisation
dc.subjectEffort prediction
dc.subjectRegression
dc.subjectFP
dc.titleMaximising data retention from the ISBSG repository
dc.typeConference Contribution
pubs.organisational-data/AUT
pubs.organisational-data/AUT/Design & Creative Technologies
pubs.organisational-data/AUT/PBRF Researchers
pubs.organisational-data/AUT/PBRF Researchers/Design & Creative Technologies PBRF Researchers
pubs.organisational-data/AUT/PBRF Researchers/Design & Creative Technologies PBRF Researchers/DCT C & M Computing
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Deng and MacDonell (2008) EASE.pdf
Size:
172.44 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
licence.htm
Size:
29.98 KB
Format:
Unknown data format
Description: