Ontario: Federated Query Processing Against a Semantic Data Lake
Abstract:
Data lakes enable flexible knowledge discovery and reduce the overhead of materialized data integration. Albeit effective for data storage, query execution over data lakes may be expensive, being demanded novel techniques to generate plans able to exploit the main characteristics of data lakes. We devise Ontario, a federated query processing approach tailored for large-scale heterogeneous data. Ontario provides efficient and effective query processing over a federation of heterogeneous data sources in a data lake. Ontario resorts to source descriptions named RDF Molecule Templates, i.e., abstract descriptions of the properties of the entities in a unified schema and their implementation in a data lake. We empirically evaluate the effectiveness of the Ontario optimization techniques over state-of-the-art benchmarks. The observed results suggest that Ontario can effectively select plans composed of subqueries that can be efficiently executed against heterogeneous data sources in a data lake.
Año de publicación:
2019
Keywords:
- Polystore
- Semantic Data Lake
- Federated engine
Fuente:
Tipo de documento:
Conference Object
Estado:
Acceso restringido
Áreas de conocimiento:
- Web Semántica
- Ciencias de la computación
Áreas temáticas:
- Programación informática, programas, datos, seguridad
- Funcionamiento de bibliotecas y archivos