Source selection and ranking in the websemantics architecture using quality of data metadata
Abstract:
The World Wide Web (WWW) has become the preferred medium for the dissemination of information in virtually every domain of activity. Standards and formats for structured data interchange are being developed. However, access to data is still hindered by the challenge of locating data relevant to a particular problem. Further, after a set of relevant sources has been identified, one must still decide which source is best suited for a given task, based on its contents or domain knowledge, and appropriately rank these sources. WWW sources typically may cover different domains and they may differ considerably with respect to a variety of quality of data (QoD) parameters. Examples of QoD parameters are completeness of the source contents and recency of update of the contents. In order to solve this problem of source selection and ranking, we maintain metadata about source content and quality of data—or scqd metadata. We use a data model for representing scqd metadata similar to those used in a data warehouse environment. The WebSemantics (WS) architecture (G. Mihaila et al., 1998, in Proc. 6th Int. Conf. On Extending Database Technology (EDBT), pp. 87–101; and G. Mihaila et al. 2000, Very Large Database Journal, in press) was developed to solve the task of publishing and locating data sources using the WWW. We first present the WS architecture. We then describe extensions to the WS data model, catalog and query engine, to handle scqd metadata. The query language for source selection and ranking supports both strict and fuzzy matching of scqd metadata. Then we present some problems in the efficient management of the scqd metadata. We discuss how scqd metadata can be organized in partially ordered sets to support efficient query processing. Some queries cannot be answered by any single source. Thus, we must consider the task of combining multiple scqd's to select combinations of sources in the answer. We consider a number of techniques for combining scqd's. We compare the efficiency of these techniques, as well as possible loss of accuracy incurred due to some of the techniques, when combining scqd metadata. © 2002, Elsevier B.V.
Año de publicación:
2002
Keywords:
Fuente:

Tipo de documento:
Article
Estado:
Acceso restringido
Áreas de conocimiento:
- Web Semántica
- Ciencias de la computación
Áreas temáticas:
- Funcionamiento de bibliotecas y archivos