Regresar

Clustering algorithm using rough set theory for unsupervised feature selection

Abstract:

Nowadays, the available data to describe real world problems grows in considerable manner, due to the amount of measurable characteristics (features) that can be collected. Machine learning techniques are widely used to extract valuable knowledge from data, but their performance might decrease when the proper features are not selected. Feature selection is introduced to search relations to disclose possible redundant or irrelevant features in a case study; this search is performed either in a supervised or unsupervised manner. In the present work, we propose an unsupervised feature selection algorithm using: (1) relative dependency to search similarities between features, (2) a clustering algorithm to group similar features, and (3) a procedure to select the most representative feature to obtain a reduced feature space. The relative dependency degree between pairs of attributes is used to compute a similarity measure. This measure is used by a clustering algorithm to perform attribute clustering through KNN and prototype based clustering. The proposal is tested with well-known benchmarks, and compared with classic supervised and unsupervised feature selection techniques. Additionally, a real world application in fault diagnosis for rotating machinery is evaluated by our proposal.