Chemometrics for QSAR with low sequence homology: Mycobacterial promoter sequences recognition with 2D-RNA entropies


Abstract:

Pbkp_redicting mycobacterial sequences promoter of protein synthesis is important in the study of protein metabolism regulation. This goal is however considered a challenging computational biology task due to low inter-sequences homology. Consequently, a previous work based only on DNA sequence had to use a large input parameter set and multilayered feed-forward ANN architecture trained using the error-back-propagation algorithm to raise an overall accuracy up to 97% [Kalate, et al. 2003. Comput. Biol. Chem. 27, 555-564]. Subsequently, one could expect that a notably simpler model may be derived using parameters based on non-linear structural information. In the present work, a method based on molecular folding negentropies (Θk) is introduced to pbkp_redict by the first time mycobacterial promoter sequences (mps) from the corresponding RNA secondary structure. The best QSAR equation found was the classification function mps = 4.921 × 0ΘM - 1.205, which recognised 126/135 mps (93.3%) and 100% of 245 control sequences (cs). The model have shown a very high Mathew regression coefficient C = 0.949. Both average overall accuracy and pbkp_redictability were 97.6%. Additionally, several machine learning algorithms were applied in order to reaffirm the validity of the LDA model from the chemometrics point of view. This linear model with only one parameter (0ΘM) may be considered the simpler reported up-to-date by large, without lack of accuracy (97%) with respect to Kalate et al.'s model. © 2006 Elsevier B.V. All rights reserved.

Año de publicación:

2007

Keywords:

  • Mycobacterial promoter sequences
  • entropy
  • Markov models
  • information theory
  • Machine learning algorithms
  • QSAR
  • RNA secondary structure

Fuente:

scopusscopus

Tipo de documento:

Article

Estado:

Acceso restringido

Áreas de conocimiento:

  • Relación cuantitativa estructura-actividad
  • Biología molecular

Áreas temáticas:

  • Ciencias de la computación
  • Lingüística
  • Biología