Regresar

Analysis model of the most important factors in Covid-19 through data mining, descriptive statistics and random forest

Abstract:

The Covid19 pandemic has had a great impact worldwide, it has become a major problem due to the demand for care in hospitals and clinics despite the low level of mortality. This is because the disease has spread rapidly as the spread between people is accelerated. So in this document we propose using a classification-oriented machine learning method, we do a classic data science process so that we can perform noise cleaning and data processing to do descriptive statistical analysis in such a way that the most important variables or factors are identified through unsupervised learning. And with this it is appreciated that the most important variables for the risk of infection and mortality that Covid-19 disease can have are diseases that affect the immune system, such as diabetes, heart disease, hypertension and also kidney disease. They can cause serious kidney problems. And the evaluation of our method will be carried out through quality measures. Finally, this work opens the door to other investigations with the aim of conducting centralized investigations on each variable related to Covid-19, in order to find relevant information that can promote an improvement in the current situation.