Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling


Abstract:

In many absorption, distribution, metabolism, and excretion (ADME) modeling problems, imbalanced data could negatively affect classification performance of machine learning algorithms. Solutions for handling imbalanced dataset have been proposed, but their application for ADME modeling tasks is underexplored. In this paper, various strategies including cost-sensitive learning and resampling methods were studied to tackle the moderate imbalance problem of a large Caco-2 cell permeability database. Simple physicochemical molecular descriptors were utilized for data modeling. Support vector machine classifiers were constructed and compared using multiple comparison tests. Results showed that the models developed on the basis of resampling strategies displayed better performance than the cost-sensitive classification models, especially in the case of oversampling data where misclassification rates for minority class have values of 0.11 and 0.14 for training and test set, respectively. A consensus model with enhanced applicability domain was subsequently constructed and showed improved performance. This model was used to pbkp_redict a set of randomly selected high-permeability reference drugs according to the biopharmaceutics classification system. Overall, this study provides a comparison of numerous rebalancing strategies and displays the effectiveness of oversampling methods to deal with imbalanced permeability data problems.

Año de publicación:

2016

Keywords:

  • Cost-sensitive learning
  • Biopharmaceutics classification system
  • ADME modeling
  • Support Vector Machine
  • Resampling technique
  • Caco-2 cell permeability

Fuente:

scopusscopus

Tipo de documento:

Article

Estado:

Acceso restringido

Áreas de conocimiento:

  • Farmacología
  • Aprendizaje automático

Áreas temáticas:

  • Fisiología humana
  • Química física
  • Ingeniería química