Multi-scale similarities in stochastic neighbour embedding: Reducing dimensionality while preserving both local and global structure


Abstract:

Stochastic neighbour embedding (SNE) and its variants are methods of nonlinear dimensionality reduction that involve soft Gaussian neighbourhoods to measure similarities for all pairs of data. In order to build a suitable embedding, these methods try to reproduce in a low-dimensional space the neighbourhoods that are observed in the high-dimensional data space. Previous works have investigated the immunity of such similarities to norm concentration, as well as enhanced cost functions, like sums of Jensen-Shannon divergences. This paper proposes an additional refinement, namely multi-scale similarities, which are averages of soft Gaussian neighbourhoods with exponentially growing bandwidths. Such multi-scale similarities can replace the regular, single-scale neighbourhoods in SNE-like methods. Their objective is then to maximise the embedding quality on all scales, with the best preservation of both local and global neighbourhoods, and also to exempt the user from having to fix a scale arbitrarily. Experiments with several data sets show that the proposed multi-scale approach captures better the structure of data and improves significantly the quality of dimensionality reduction.

Año de publicación:

2015

Keywords:

  • Jensen-Shannon divergence
  • Manifold learning
  • Stochastic neighbour embedding
  • Data visualisation
  • Nonlinear dimensionality reduction

Fuente:

scopusscopus

Tipo de documento:

Article

Estado:

Acceso restringido

Áreas de conocimiento:

  • Aprendizaje automático
  • Optimización matemática
  • Optimización matemática

Áreas temáticas:

  • Ciencias de la computación
  • Métodos informáticos especiales
  • Ciencias Naturales y Matemáticas