Chemometrics for QSAR with low sequence homology: Mycobacterial promoter sequences recognition with 2D-RNA entropies


Abstract:

Predicting mycobacterial sequences promoter of protein synthesis is important in the study of protein metabolism regulation. This goal is however considered a challenging computational biology task due to low inter-sequences homology. Consequently, a previous work based only on DNA sequence had to use a large input parameter set and multilayered feed-forward ANN architecture trained using the error-back-propagation algorithm to raise an overall accuracy up to 97% [Kalate, et al. 2003. Comput. Biol. Chem. 27, 555-564]. Subsequently, one could expect that a notably simpler model may be derived using parameters based on non-linear structural information. In the present work, a method based on molecular folding negentropies (Θ<inf>k</inf>) is introduced to predict by the first time mycobacterial promoter sequences (mps) from the corresponding RNA secondary structure. The best QSAR equation found was the classification function mps = 4.921 ×<sup> 0</sup>Θ<inf>M</inf> - 1.205, which recognised 126/135 mps (93.3%) and 100% of 245 control sequences (cs). The model have shown a very high Mathew regression coefficient C = 0.949. Both average overall accuracy and predictability were 97.6%. Additionally, several machine learning algorithms were applied in order to reaffirm the validity of the LDA model from the chemometrics point of view. This linear model with only one parameter (<sup>0</sup>Θ<inf>M</inf>) may be considered the simpler reported up-to-date by large, without lack of accuracy (97%) with respect to Kalate et al.'s model. © 2006 Elsevier B.V. All rights reserved.

Año de publicación:

2007

Keywords:

  • Mycobacterial promoter sequences
  • entropy
  • Markov models
  • information theory
  • Machine learning algorithms
  • QSAR
  • RNA secondary structure

Fuente:

scopusscopus

Tipo de documento:

Article

Estado:

Acceso restringido

Áreas de conocimiento:

  • Relación cuantitativa estructura-actividad
  • Biología molecular

Áreas temáticas de Dewey:

  • Ciencias de la computación
  • Lingüística
  • Biología
Procesado con IAProcesado con IA

Objetivos de Desarrollo Sostenible:

  • ODS 3: Salud y bienestar
  • ODS 4: Educación de calidad
  • ODS 9: Industria, innovación e infraestructura
Procesado con IAProcesado con IA