Regresar

A heuristic-based approach for planning federated SPARQL queries

Abstract:

A large number of SPARQL endpoints are available to access the Linked Open Data cloud, but query capabilities still remain very limited. Thus, to support effcient semantic data management of federations of endpoints, existing SPARQL query engines require to be equipped with new functionalities. First, queries need to be decomposed into sub-queries not only answered by the available endpoints, but also executable in a way that the bandwidth usage is minimized. Second, query engines have to be able to gather the answers produced by the endpoints and merge them following a plan that reduces intermediate results.We address these problems and propose techniques that only rely on information about the predicates of the datasets accessible through the endpoints, to identify bushy plans comprise of sub-queries that can be effciently executed. These techniques have been implemented on top of one existing RDF engine, and their performance has been studied on the FedBench benchmark. Experimental results show that our approach may support successful evaluation of queries, when other federated query engines fail, either because endpoints are unable to execute the sub-queries or federated query plans are too expensive.