Semantic data integration techniques for transforming big biomedical data into actionable knowledge


Abstract:

FAIR principles and the Open Data initiatives have motivated the publication of large volumes of data. Specifically, in the biomedical domain, the size of the data has increased exponentially in the last decade, and with the advances in the technologies to collect and generate data, a faster growth rate is expected for the next years. The available collections of data are characterized by the dominant dimensions of big data, i.e., they are not only large in volume, but they can be also heterogeneous and present quality issues. These data complexity problems impact on the typical tasks of data management, and particularly, in the task of integrating big biomedical data sources. We tackle the problem of big data integration and present a knowledge-driven framework able to extract and integrate data collected from structured and unstructured data sources. The proposed framework resorts to Natural Language Processing techniques to extract knowledge from unstructured data and short text. Furthermore, ontologies and controlled vocabularies, e.g., UMLS, are utilized to annotate the extracted entities and relations with terms from the ontology or controlled vocabulary. The annotated data is integrated into a knowledge graph. A unified schema is used to describe the meaning of the integrated data as well as the main properties and relations. As proof of concept, we show the results of applying the proposed framework to integrate clinical records from lung cancer patients with data extracted from open data sources like Drugbank and PubMed. The created knowledge graph enables the discovery of interactions between drugs in the treatments prescribed to lung cancer patients.

Año de publicación:

2019

Keywords:

  • semantic data integration
  • BIG DATA
  • Knowledge Graph
  • Natural Language processing
  • Biomedical Data

Fuente:

scopusscopus

Tipo de documento:

Conference Object

Estado:

Acceso restringido

Áreas de conocimiento:

  • Análisis de datos
  • Ciencias de la computación

Áreas temáticas:

  • Funcionamiento de bibliotecas y archivos