Clustering Count Data with Stochastic Expectation Propagation


Abstract:

Clustering count vectors is a challenging task given their sparsity and high-dimensionality. An efficient generative model called EMSD has been recently proposed, as an exponential-family approximation to the Multinomial Scaled Dirichlet distribution, and has shown to offer excellent modeling capabilities in the case of sparse count data and to overcome some limitations of the frameworks based on the Dirichlet distribution. In this work, we develop an approximate Bayesian learning framework for the parameters of a finite mixture of EMSD using the Stochastic Expectation Propagation approach. In this approach, we maintain a global posterior approximation that is being updated in a local way, which reduces the memory consumption, important when making inference in large datasets. Experiments on both synthetic and real count data have been conducted to validate the effectiveness of the proposed algorithm in comparison to other traditional learning approaches. Results show that SEP produces comparable estimates with traditional approaches.

Año de publicación:

2021

Keywords:

  • Emsd distribution
  • Stochastic expectation propagation
  • Mixture model

Fuente:

googlegoogle
scopusscopus

Tipo de documento:

Conference Object

Estado:

Acceso restringido

Áreas de conocimiento:

  • Inferencia estadística
  • Optimización matemática

Áreas temáticas:

  • Programación informática, programas, datos, seguridad