The BAY-HIST pbkp_rediction model for RDF documents


Abstract:

In real-world RDF documents, property subject and object values are often correlated. The identification of these relationships is of significant relevance to many applications, e.g., query evaluation planning and linking analysis. In this paper we present the BAY-HIST Pbkp_rediction Model, a combination of Bayesian networks and multidimensional histograms which is able to identify the probability of these dependencies. In general, Bayesian networks assume a small number of discrete values for each of the variables considered in the network. However, in the context of the Semantic Web, variables that represent the concepts in large-sized RDF documents may contain a very large number of values; thus, BAY-HIST implements multidimensional histograms in order to aggregate the data associated with each node in the network. We illustrate the benefits of applying BAY-HIST to the problem of query selectivity estimation as part of costbased query optimization. We report initial experimental results on the pbkp_redictive capability of this model and the effectiveness of our optimization techniques when used together with BAY-HIST. The results suggest that the quality of the optimal evaluation plan has improved over the plan identified by existing cost models that assume independence and uniform distribution of the data values.

Año de publicación:

2010

Keywords:

    Fuente:

    scopusscopus

    Tipo de documento:

    Conference Object

    Estado:

    Acceso restringido

    Áreas de conocimiento:

    • Minería de datos
    • Ciencias de la computación

    Áreas temáticas:

    • Funcionamiento de bibliotecas y archivos