Exploring ISBSG R12 Dataset Using Multi-data Analytics


Abstract:

This paper presents an exploratory study that applies three data analysis techniques: statistical analysis, data clustering, and visualization conducted to the ISBSG R12 data set. Both SPSS and RapidMiner are used to conduct the analysis. While statistical analysis main advantage is the summarization of data, the overall behavior of the data is lost, particularly the view of outlier values. The study applied two techniques in this regard using SPSS: correlation analysis and the general linear model using multiple variables. The statistical analysis showed a high significant level of relationship between and among the selected variables. In the data mining areas, the clustering technique and visualization used both SPSS and RapidMiner (RM). For the selected variables, the number of clusters is determined after several runs, in an attempt to diversify the one larger cluster into several sub-clusters. Finally, visualization technique demonstrates how it could show concentration and trends. Statistical analysis found high correlation between speed of delivery and manpower delivery rate, and the independent factors of industry type and development methodologies vs. the dependent variable of defect density. The clustering process highlighted the importance of variables related to work efforts and defects in forming the clusters. Major conclusions of the visualization charts revealed an inverse no-linear relationship between effort of analysis and design of total effort and speed of delivery form one side and total defects delivered. Overall, multiple view of data analytics is needed to arrive at a clear and consistent understanding of the underlying behavior of the data in a complex data set such as ISBSG.

Año de publicación:

2020

Keywords:

  • Visualization
  • Clustering
  • SPSS
  • RapidMiner
  • Data Mining
  • ISBSG release 12
  • Correlation
  • data analytics

Fuente:

scopusscopus

Tipo de documento:

Conference Object

Estado:

Acceso restringido

Áreas de conocimiento:

  • Análisis de datos
  • Ciencias de la computación

Áreas temáticas:

  • Ciencias de la computación
  • Métodos informáticos especiales
  • Funcionamiento de bibliotecas y archivos