A Novel Framework for Fast Feature Selection Based on Multi-Stage Correlation Measures
Abstract:
Datasets with thousands of features represent a challenge for many of the existing learning methods because of the well known curse of dimensionality. Not only that, but the presence of irrelevant and redundant features on any dataset can degrade the performance of any model where training and inference is attempted. In addition, in large datasets, the manual management of features tends to be impractical. Therefore, the increasing interest of developing frameworks for the automatic discovery and removal of useless features through the literature of Machine Learning. This is the reason why, in this paper, we propose a novel framework for selecting relevant features in supervised datasets based on a cascade of methods where speed and precision are in mind. This framework consists of a novel combination of Approximated and Simulate Annealing versions of the Maximal Information Coefficient (MIC) to generalize the simple linear relation between features. This process is performed in a series of steps by applying the MIC algorithms and cutoff strategies to remove irrelevant and redundant features. The framework is also designed to achieve a balance between accuracy and speed. To test the performance of the proposed framework, a series of experiments are conducted on a large battery of datasets from SPECTF Heart to Sonar data. The results show the balance of accuracy and speed that the proposed framework can achieve.
Año de publicación:
2022
Keywords:
- Machine learning
- python framework
- feature selection
Fuente:

Tipo de documento:
Article
Estado:
Acceso abierto
Áreas de conocimiento:
- Aprendizaje automático
- Algoritmo
- Ciencias de la computación
Áreas temáticas:
- Programación informática, programas, datos, seguridad
- Funcionamiento de bibliotecas y archivos
- Métodos informáticos especiales