Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data
Abstract:
Background: Extracting relevant information from microarray data is a very complex task due to the characteristics of the data sets, as they comprise a large number of features while few samples are generally available. In this sense, feature selection is a very important aspect of the analysis helping in the tasks of identifying relevant genes and also for maximizing pbkp_redictive information. Methods. Due to its simplicity and speed, Stepwise Forward Selection (SFS) is a widely used feature selection technique. In this work, we carry a comparative study of SFS and Genetic Algorithms (GA) as general frameworks for the analysis of microarray data with the aim of identifying group of genes with high pbkp_redictive capability and biological relevance. Six standard and machine learning-based techniques (Linear Discriminant Analysis (LDA), Support Vector Machines (SVM), Naive Bayes (NB), C-MANTEC Constructive Neural Network, K-Nearest Neighbors (kNN) and Multilayer perceptron (MLP)) are used within both frameworks using six free-public datasets for the task of pbkp_redicting cancer outcome. Results: Better cancer outcome pbkp_rediction results were obtained using the GA framework noting that this approach, in comparison to the SFS one, leads to a larger selection set, uses a large number of comparison between genetic profiles and thus it is computationally more intensive. Also the GA framework permitted to obtain a set of genes that can be considered to be more biologically relevant. Regarding the different classifiers used standard feedforward neural networks (MLP), LDA and SVM lead to similar and best results, while C-MANTEC and k-NN followed closely but with a lower accuracy. Further, C-MANTEC, MLP and LDA permitted to obtain a more limited set of genes in comparison to SVM, NB and kNN, and in particular C-MANTEC resulted in the most robust classifier in terms of changes in the parameter settings. Conclusions: This study shows that if pbkp_rediction accuracy is the objective, the GA-based approach lead to better results respect to the SFS approach, independently of the classifier used. Regarding classifiers, even if C-MANTEC did not achieve the best overall results, the performance was competitive with a very robust behaviour in terms of the parameters of the algorithm, and thus it can be considered as a candidate technique for future studies. © 2014 Luque-Baena et al.; licensee BioMed Central Ltd.
Año de publicación:
2014
Keywords:
- Genetic Algorithms
- feature selection
- Constructive Neural Networks
- Microarray
Fuente:
Tipo de documento:
Article
Estado:
Acceso abierto
Áreas de conocimiento:
- Aprendizaje automático
- Genética
- Ciencias de la computación
Áreas temáticas:
- Programación informática, programas, datos, seguridad
- Patentes
- Fisiología humana