A Dataset for Analysis of Quality Code and Toxic Comments


Abstract:

Software development has an important human aspect, so it is known that the feelings of developers have a significant impact on software development and could affect the quality, productivity and performance of developers. In this study, we have begun the process of finding, understanding and relating these affects to software quality. We propose a quality code and sentiments dataset, a clean set of commits, code quality and toxic sentiments of 19 projects obtained from GitHub. The dataset extracts messages from the commits present in GitHub along with quality metrics from SonarQube. Using this information, we run machine learning techniques with the ML.Net tool to identify toxic developer sentiments in commits that could affect code quality. We analyzed 218K commits from the 19 selected projects. The analysis of the projects took 120 days. We also describe the process of building the tool and retrieving the data. The dataset will be used to further investigate in depth the factors that affect developers’ emotions and whether these factors are related to code quality in the life cycle of a software project. In addition, code quality will be estimated as a function of developer sentiments.

Año de publicación:

2023

Keywords:

  • Software Engineering
  • GitHub
  • Commits
  • Sentiments Analysis
  • Sonarqube
  • Software quality
  • Toxic comment classification

Fuente:

scopusscopus

Tipo de documento:

Conference Object

Estado:

Acceso restringido

Áreas de conocimiento:

  • Análisis de datos
  • Ciencias de la computación

Áreas temáticas:

  • Interacción social
  • Programación informática, programas, datos, seguridad
  • Funcionamiento de bibliotecas y archivos