Support vector regression-based imputation in analogy-based software development effort estimation


Abstract:

Missing data (MD) is a widespread problem that can affect the ability to use data to construct effective software development effort estimation (SDEE) techniques. To deal with this challenge, several imputation techniques have been investigated in SDEE and k-nearest neighbors (KNN)-based imputation is still the most frequently used. To the best of our knowledge, no study has used support vector regression (SVR)-based imputation to construct accurate estimation techniques, in particular those based on analogy. This paper introduces a new imputation technique based on SVR for handling MD in two analogy-based SDEE techniques: classical analogy and fuzzy analogy. More specifically, we investigate whether the use of SVR instead of KNN in imputing MD improves the pbkp_redictive performance of these two analogy-based techniques. A total of 1134 experiments were conducted involving seven datasets, SVR/KNN MD imputation techniques (KNN with Euclidean and Manhattan distances), three missingness mechanisms (missing completely at random, missing at random, non-ignorable missing), and MD percentages from 10% to 90%. The results suggest that the use of SVR imputation, rather than KNN imputation, may improve the pbkp_rediction performance of both analogy-based techniques. Furthermore, we found that the impact of MD percentage upon effort pbkp_rediction performance is reduced when using SVR rather than KNN. Moreover, fuzzy analogy generates better estimates in terms of the standardized accuracy measure than classical analogy regardless of the MD technique, the dataset used, the missingness mechanism, or the MD percentage.

Año de publicación:

2018

Keywords:

  • Support Vector Machine
  • Missing data
  • analogy-based software development effort estimation
  • K-NEAREST NEIGHBORS
  • imputation

Fuente:

scopusscopus

Tipo de documento:

Article

Estado:

Acceso restringido

Áreas de conocimiento:

  • Aprendizaje automático
  • Software

Áreas temáticas:

  • Ciencias de la computación