Database fingerprint (DFP): an approach to represent molecular databases
Abstract:
Background: Molecular fingerprints are widely used in several areas of chemoinformatics including diversity analysis and similarity searching. The fingerprint-based analysis of chemical libraries, in particular of large collections, usually requires the molecular representation of each compound in the library that may lead to issues of storage space and redundant calculations. In fact, information redundancy is inherent to the data, resulting on binary digit positions in the fingerprint without significant information. Results: Herein is proposed a general approach to represent an entire compound library with a single binary fingerprint. The development of the database fingerprint (DFP) is illustrated first using a short fingerprint (MACCS keys) for 10 data sets of general interest in chemistry. The application of the DFP is further shown with PubChem fingerprints for the data sets used in the primary example but with a larger number of compounds, up to 25,000 molecules. The performance of DFP were studied through differential Shannon entropy, k-mean clustering, and DFP/Tanimoto similarity. Conclusions: The DFP is designed to capture key information of the compound collection and can be used to compare and assess the diversity of molecular libraries. This Preliminary Communication shows the potential of the novel fingerprint to conduct inter-library relationships. A major future goal is to apply the DFP for virtual screening and developing DFP for other data sets based on several different type of fingerprints.
Año de publicación:
2017
Keywords:
- DIVERSITY
- Information content
- Molecular fingerprints
- Shannon entropy
- similarity
Fuente:
Tipo de documento:
Article
Estado:
Acceso abierto
Áreas de conocimiento:
- Base de datos
- Ciencia de materiales
Áreas temáticas:
- Programación informática, programas, datos, seguridad
- Química y ciencias afines
- Física aplicada