Exploring ISBSG R12 Dataset Using Multi-data Analytics
Abstract:
This paper presents an exploratory study that applies three data analysis techniques: statistical analysis, data clustering, and visualization conducted to the ISBSG R12 data set. Both SPSS and RapidMiner are used to conduct the analysis. While statistical analysis main advantage is the summarization of data, the overall behavior of the data is lost, particularly the view of outlier values. The study applied two techniques in this regard using SPSS: correlation analysis and the general linear model using multiple variables. The statistical analysis showed a high significant level of relationship between and among the selected variables. In the data mining areas, the clustering technique and visualization used both SPSS and RapidMiner (RM). For the selected variables, the number of clusters is determined after several runs, in an attempt to diversify the one larger cluster into several sub-clusters. Finally, visualization technique demonstrates how it could show concentration and trends. Statistical analysis found high correlation between speed of delivery and manpower delivery rate, and the independent factors of industry type and development methodologies vs. the dependent variable of defect density. The clustering process highlighted the importance of variables related to work efforts and defects in forming the clusters. Major conclusions of the visualization charts revealed an inverse no-linear relationship between effort of analysis and design of total effort and speed of delivery form one side and total defects delivered. Overall, multiple view of data analytics is needed to arrive at a clear and consistent understanding of the underlying behavior of the data in a complex data set such as ISBSG.
Año de publicación:
2020
Keywords:
- Visualization
- Clustering
- SPSS
- RapidMiner
- Data Mining
- ISBSG release 12
- Correlation
- data analytics
Fuente:

Tipo de documento:
Conference Object
Estado:
Acceso restringido
Áreas de conocimiento:
- Análisis de datos
- Ciencias de la computación
Áreas temáticas:
- Ciencias de la computación
- Métodos informáticos especiales
- Funcionamiento de bibliotecas y archivos