Reducing inconsistency in data warehouses


Abstract:

A data warehouse is a repository of data formed of a collection of data extracted from different and possible heterogeneous sources (e.g., databases or files). One of the main problems in integrating databases into a common repository is the possible inconsistency of the values stored in them, i.e., the very same term may have different values, due to misspelling, a permuted word order, spelling variants and so on. In this paper, we present an automatic method for reducing inconsistency found in existing databases, and thus, improving data quality. All the values that refer to a same term are clustered by measuring their degree of similarity. The clustered values can be assigned to a common value that, in principle, could substitute the original values. Thus, the values are uniformed. The method we propose provides good results with a considerably low error rate.

Año de publicación:

2001

Keywords:

  • DATA WAREHOUSES
  • Data cleaning
  • Data integration

Fuente:

scopusscopus

Tipo de documento:

Conference Object

Estado:

Acceso restringido

Áreas de conocimiento:

  • Base de datos
  • Base de datos

Áreas temáticas:

  • Funcionamiento de bibliotecas y archivos
  • Métodos informáticos especiales
  • Instrumentos de precisión y otros dispositivos