Support vector regression-based imputation in analogy-based software development effort estimation
Abstract:
Missing data (MD) is a widespread problem that can affect the ability to use data to construct effective software development effort estimation (SDEE) techniques. To deal with this challenge, several imputation techniques have been investigated in SDEE and k-nearest neighbors (KNN)-based imputation is still the most frequently used. To the best of our knowledge, no study has used support vector regression (SVR)-based imputation to construct accurate estimation techniques, in particular those based on analogy. This paper introduces a new imputation technique based on SVR for handling MD in two analogy-based SDEE techniques: classical analogy and fuzzy analogy. More specifically, we investigate whether the use of SVR instead of KNN in imputing MD improves the pbkp_redictive performance of these two analogy-based techniques. A total of 1134 experiments were conducted involving seven datasets, SVR/KNN MD imputation techniques (KNN with Euclidean and Manhattan distances), three missingness mechanisms (missing completely at random, missing at random, non-ignorable missing), and MD percentages from 10% to 90%. The results suggest that the use of SVR imputation, rather than KNN imputation, may improve the pbkp_rediction performance of both analogy-based techniques. Furthermore, we found that the impact of MD percentage upon effort pbkp_rediction performance is reduced when using SVR rather than KNN. Moreover, fuzzy analogy generates better estimates in terms of the standardized accuracy measure than classical analogy regardless of the MD technique, the dataset used, the missingness mechanism, or the MD percentage.
Año de publicación:
2018
Keywords:
- Support Vector Machine
- Missing data
- analogy-based software development effort estimation
- K-NEAREST NEIGHBORS
- imputation
Fuente:
Tipo de documento:
Article
Estado:
Acceso restringido
Áreas de conocimiento:
- Aprendizaje automático
- Software
Áreas temáticas:
- Ciencias de la computación