Data analysis with intersection graphs


Abstract:

This paper presents a new framework for multivariate data analysis, based on graph theory, using intersection graphs [1]. We have named this approach DAIG Data Analysis with Intersection Graphs. This new framework represents data vectors as paths on a graph, which has a number of advantages over the classical table representation of data. To do so, each node represents an atom of information, i.e. a pair of a variable and a value, associated with the set of observations for which that pair occurs. An edge exists between a pair of nodes whenever the intersection of their respective sets is not empty. We show that this representation of data as an intersection graph allows an easy and intuitive geometric interpretation of data observations, groups of observations, and results of multivariate data analysis techniques such as biplots, principal components, cluster analysis, or multidimensional scaling. These will appear as paths on the graph, relating variables, values and observations. This approach allows for a compact and memory efficient representation of data that contains many missing values or multi-valued attributes. The basic principles and advantages of this approach are presented with an example of its application to a simple toy problem. The main features of this methodology are illustrated with the aid software specifically developed for this purpose. © 2013 The Authors. Published by Elsevier B.V.

Año de publicación:

2013

Keywords:

  • categorical data
  • Data structures
  • Intersection graphs
  • Data models
  • Multivariate Data Analysis

Fuente:

scopusscopus

Tipo de documento:

Conference Object

Estado:

Acceso abierto

Áreas de conocimiento:

  • Análisis de datos
  • Teoría de grafos
  • Optimización matemática

Áreas temáticas:

  • Ciencias de la computación