Systematic Review of Machine Learning-Based Open-Source Software Maintenance Effort Estimation


Abstract:

Background: Software maintenance is known as a laborious activity in the software lifecycle and is often considered more expensive than other activities. Open-Source Software (OSS) has gained considerable acceptance in the industry recently, and the Maintenance Effort Estimation (MEE) of such software has emerged as an important research topic. In this context, researchers have conducted a number of open-source software maintenance effort estimation (O-MEE) studies based on statistical as well as machine learning techniques for better estimation. Objective: The objective of this study is to perform a systematic literature review (SLR) to analyze and summarize the empirical evidence of O-MEE ML techniques in current research through a set of five Research Questions (RQs) related to several criteria (e.g. data pre-processing tasks, data mining tasks, tuning parameter methods, accuracy criteria and statistical tests, as well as ML techniques reported in the literature that outperformed). Methods: We performed a systematic literature review of 36 primary empirical studies published from 2000 to June 2020, selected based on an automated search of six digital databases. Results: The findings show that Bayesian networks, decision tree, support vector machines and instance-based reasoning were the ML techniques most used; few studies opted for ensemble or hybrid techniques. Researchers have paid less attention to O-MEE data pre-processing in terms of feature selection, methods that handle missing values and imbalanced datasets, and tuning parameters of ML techniques. Classification data mining is the task most addressed using different accuracy criteria such as Precision, Recall, and Accuracy, as well as Wilcoxon and Mann-Whitney statistical tests. Conclusion: This SLR identifies a number of gaps in the current research and suggests areas for further investigation. For instance, since OSS includes different data source formats, researchers should pay more attention to data pre-processing and develop new models using ensemble techniques since they have proved to perform better.

Año de publicación:

2023

Keywords:

  • empirical studies
  • maintenance effort estimation
  • data pre-processing
  • Open-source software
  • Machine learning techniques
  • ensemble techniques

Fuente:

scopusscopus

Tipo de documento:

Review

Estado:

Acceso restringido

Áreas de conocimiento:

  • Ingeniería de software
  • Software

Áreas temáticas:

  • Ciencias de la computación